Small Ball and Discrepancy Inequalities 



Michael T. Lacey 



Michael Lacey, School of Mathematics, Georgia Institute of Technology, Atlanta, 
GA 30332 USA 

E-mail address: lacey@math . gatech . edu 

Dedicated to the Memory of Walter Philipp, Teacher and Steadfast Friend 



Contents 



Preface 5 

Chapter 1. The Small Ball Problem 7 

1.1. The Principal Conjecture 7 

1.2. The Trivial Bounds 9 

1.3. Proof of Talagrand's Theorem 10 

1.4. Exponential Moments 11 

1.5. Definitions and Initial Lemmas for Dimension Three 13 

1.6. Jozsef Beck's Short Riesz Product 14 

1.7. The Beck Gain in the Simplest Instance 18 

1.8. Norm Estimates Particular to the Hyperbolic Assumption 25 

1.9. The Beck Gain 28 

Chapter 2. Irregularities of Distributions 39 

2.1. Discrepancy 39 

2.2. Conjectures for Discrepancy 41 

2.3. Elementary Propositions 43 

2.4. Proof of Schmidt's Theorem 45 

2.5. Proof of Theorem 2.1.9 47 

2.6. The L 1 bound in dimension 2 48 

Chapter 3. Some Aspects of Harmonic Analysis 51 

3.1. Exponential Orlicz Classes 51 

3.2. Khintchine Inequalities 53 

3.3. Maximal Function Estimates 54 

3.4. Littlewood Paley Theory 56 

3.5. Product Theory 64 

Chapter 4. Other Applications: Approximation Theory and Probability Theory 67 

4.1. Mixed Derivatives 67 

4.2. The Brownian Sheet 70 
References 73 



3 



Preface 



We discuss an inequality for three dimensional Haar functions motivated by questions 
in a range of areas. These are 

• Irregularity of Distributions of points in the unit cube, relative to boxes in the 
standard coordinate basis. 

• Chung's Law for the Brownian Sheet, or equivalently, sharp estimates for the 
probability that the Brownian Sheet has a small sup norm in the unit cube. 

• Lower bounds on the number of L°° balls of small radius needed to cover certain 
compact classes of functions with bounded mixed derivative in three dimensions. 

Of these three questions, the first admits the easiest description, and has the longest history, 
beginning with van Aardenne-Ehrenfest [2, 3], with significant contributions by a variety 
of authors over many years. See the first chapter of Beck and Chen [5]. Our methods are 
influenced by many of these contributions; the reader will find references to them in the 
pages below. Indeed, these notes are our effort to understand the famous contribution of 
Jozsef Beck [4] to the irregularities of distribution in three dimensions, and its connection 
with other questions in analysis. Along the way, we will simplify and extend his argument, 
in a manner that raises hopes that one could resolve the issue in three dimensions. 

The latter two problems listed above have a more sophisticated description, indeed 
one that admits an abstract formulation. The relationship between them is rather precise, 
and well known, [22,23]. 

These topics are unified by their methods of proofs. In its simplest manifestation, this 
is a particular inequality about Haar functions in three dimensions, a question which can 
be viewed as just beyond the reach of Littlewood Paley theory. We take this question as 
our main focus, as doing so will permit us to develop the necessary analytical tools with 
some efficiency. We establish a partial result in the direction of the main conjecture in the 
subject, Theorem 1.1.7. Afterwords, we discuss the other subjects above. 

In the subject of Irregularities of Distribution, the principal new result is an extension 
of the result of Beck already cited, namely Theorem 2.1.9. The entire subject is also of 
interest in two dimensions; we include this theory in our notes, as it is the foundation 
from which one must generalize. The two dimensional case is substantially easier, and 
all important elements of that theory have been developed see [20, 28, 34, 35] among other 
references listed in the paper below. 

The central methods of this paper are those of Harmonic Analysis: Riesz products; 
Littlewood Paley inequalities; conditional expectation arguments; and product theory. 

5 
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PREFACE 



These notes are written with a focus on these issues. (This is the area of expertise of 
the author.) We have written a separate chapter recalling some of these basic issues in a 
separate chapter, see Chapter 3. As our subject touches a range of issues, we have also 
included background material on Irregularities of Distributions, Approximation Theory, 
and Probability Theory. These are offered for the convenience of the reader, with the caveat 
that the author is not an expert in these subjects. 



Notation. The language and notation of probability and expectation is used throughout. 
Thus, 



and P(A) = El^. This serves to keep formulas simpler. As well, certain conditional 
expectation arguments are essential to us. We use the notation 



is the conditional expectation of / given T . In all instances, will be generated by a finite 
collection of atoms ^atoms/ in which case 



We suppress many constants which do not affect the arguments in essential ways. 
A < B means that there is an absolute constant so that A < KB. Thus A < 1 means that A 
is bounded by an absolute constant. And A B means A < B < A. 

Acknowledgment. Walter Philipp, my thesis advisor who passed away unexpectedly 
in the summer of 2006, introduced me to this topic while I was in graduate school. Vladimir 
Temlyakov lead me through the theory that had been developed since graduate school 
days. I report on joint work with Dmitry Bilyk. We have benefited from several conver- 
sations with Mihalis Kolountzakis and Vladimir Temlyakov on this subject. A substantial 
part of this manuscript was written while in residence at the University of Crete. 




F(B\A) = P(A)- 1 P(AnB). 



For a sigma field 



Hf\T) 



CHAPTER 1 



The Small Ball Problem 



1.1. The Principal Conjecture 



In one dimension, the class of dyadic intervals are D '.= {[]2 k , (j + l)2 ,c ) : /, k e Z}. Each 
dyadic interval has a left and right half, indicated below, which are also dyadic. Define 
the Haar functions 



Note that this is an L°° normalization of these functions, which we will keep through out 
these notes. This will cause some formulas to look a little odd to readers accustomed to an 
L 2 normalization for Haar functions. 

In dimension d, a dyadic rectangle is a product of dyadic intervals, thus an element of 
A Haar function associated to R we take to the be product of the Haar functions associated 
with each side of R, namely 



This is the usual 'tensor' definition. 1 

We will concentrate on rectangle with a fixed volume, and consider a local problem. 
This is the 'hyperbolic' assumption, that pervades the subject. Our concern is the following 
Theorem and Conjecture concerning a lower bound on the L°° norm of sums of hyperbolic 
Haar functions: 

1.1.1. Talagrand's Theorem. For dimensions d>2,we have 



h ■■= ~h Mt + lj, 



right 



d 



h RlX -xR d ( x i> •••,x d ):= J | h Rj (xj) 

;'=i 



(1.1.2) 




|R|=2-' 



|R|>2"" 



Here, the sum on the right is taken over all rectangles with area at least 2 

1.1.3. Small Ball Conjecture. For dimension d>3we have the inequality 



(1.1.4) 




CO 



\R\=2~> 



\R\>2~» 



1 



Note that we are not claiming that these functions form a basis. 
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1. THE SMALL BALL PROBLEM 



> n d/2 



This conjecture is, by one square root, better than the trivial estimate available from 
Cauchy Schwartz, see § 1.2. As well, see that section for an explaination as to why the 
conjecture is sharp. The motivations for the conjecture are indirect, a subject we return 
to in the discussion of functions with L 2 mixed partials below, § 4.1. Nevertheless, we 
have begun with this conjecture as it provides the quickest path to the essential technical 
aspects behind the various conjectures of these notes. 

The result in the case of d = 2 is that of Talagrand [34]. We will give the easier proof 
of Temlyakov [35], which proof resonates with the ideas of Roth [25], Schmidt [28], and 
Halasz [20]. Compare § 1.3 and § 2.4. 

For many applications of interest, one can restrict attention to this version of the 
conjecture 

1.1.5. Restricted Small Ball Conjecture. We have the inequality (1.1.2), or (1.1.4), in the 
case where the coefficients a(R) 6 {0, ±1}, for \R\ = 2~" and about half of the a(R) ± 0. Namely, 
under these assumptions on the coefficients a(R) we have the inequality 

(1.1.6) | Yj cc(R)h R 

|R|=2"» 

It is possible that the proof would simplify considerably — and be of interest — if one in 
addition assumed that \a(R)\ = 1. But some of the applications may not be available in this 
case. 

The principal point of these notes is to expound on the three dimensional case, pro- 
viding a partial resolution of this case. We extend and simplify an approach of J. Beck [4], 
establishing this result. 

1.1.7. Theorem. In dimension d = 3, there is a small positive e > for which we have the 
estimate 

(1.1.8) 2~ n Y \a R \ < nHI Y a r h R 

II oo 

|R|=2-" |R|=2"" 

This result is due to Bilyk and Lacey [1]. Beck [4] established this inequality with n~ e 
replaced by a term logarithmic in n. 2 

The organization of the proof, at the highest level, and outlined in § 1.6, is that of Jozsef 
Beck [4]. At the same time, both the exact construction and subsequent details are in many 
respects easier than in Beck's paper. In particular, the construction in that section is a 
Riesz product construction, following the lines of § 1.3. But, the product, with our current 
understanding, must be taken to be 'short,' a dictation to us from the third dimension: The 
'product rule' 1.3.1 does not hold in dimension three. This unfortunate, and critical fact, 
forces the definition of 'strongly distinct' on us. See Definition 1.5.5. 

Critically, Jozsef Beck observed that in the case of that the 'strongly distinct' does not 
hold, there is a gain over naive estimates. See Lemma 1.7.2 and Theorem 1.9.1. We will 

J. Beck did not state the result this way, as the principal concern of that paper is on the question of 
irregularities of distribution. See § 2.1. 



1.2. THE TRIVIAL BOUNDS 
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refer to any instance of this phenomena as the Beck Gain. The simplest instance of this is 
discussed in detail in § 1.7. Here, we obtain a better range of results, and a larger gain, 
than Beck. 

Beck's insight is that this gain permits one to carry out a proof, provided the Riesz 
product is sufficiently short, so short that the combinatorial explosion generated by the 
expansion of the Riesz product does not overwhelm the gain. 

Beck's gain has other surprising implications, namely in § 1.8 we see that hyperbolic 
sums of Haar functions obey a range of sub-gaussian estimates, 3 not predicted by the 
general theory in § 1.4. This section employs a conditional expectation argument to permit 
an effective application of the Beck gain. 

Concerning the value of e for which our Theorem holds, it is computable, but we do not 
carry out this step, as the particular e we would obtain is certainly not optimal. Instead, 
the point of this proof is that the methods pioneered by Jozsef Beck are more powerful 
than originally suspected. We expect more efficient organizations of the proof will yield 
quantifiable and substantive improvements to the results of this paper. 



1.2. The Trivial Bounds 

The inequality (1.1.2) with an extra square root of n is easy to prove. 
1.2.1. Lemma. It is the case that 

Y \a(R)\ ■ \R\ < Y ot{R)h 



\R\=2-» 



\R\=2-» 



Proof. Each point x e [0, l] d , is in at most n d ~ l possible rectangles. This is the essential 
point dictated by the hyperbolic nature of the problem. Using this, and the Cauchy- 
Schwartz inequality, we have 



Y • \ R \ = I Y Wi 



\R\=2-" 



\R\=2-» 





i-i) 


L|R|=2-» 


<nh 




Y 

\R\=2-" 






Y a W h R 



-|l/2 



|R|=2"" 



□ 



'This observation is not essential to our main theorem. 
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Let us also see that the Small Ball Conjecture is sharp. Indeed, we take the a(R) to be 
random choices of signs. It is immediate that 



I«rI - n d 1 



|R|=2-" 

We now turn to properties of Rademachers outlined in Chapter 3. On the other hand, for 
fixed x G [0, l] d we have 

i, 



n i(d-D 



E| Yj oc(R)h R (x) 

\R\=2~" 

It is also well known that sums of Rademacher random variables obey a sub-Gaussian 
distributional estimate. The supremum of such sums admit easily estimated upper bounds. 
In particular, it is enough to test the L°° norm of the sum at a grid of 2 1ld points in the unit 
cube, hence we have 



E 



Y a(R)h R < -^log2" d • sup e| a(R)h R (x) 

\R\=2~" x |R|=2"" 

<n d/2 . 



Comparing these two estimates shows that the Small Ball Conjecture is sharp. 

The Small Ball Conjecture could be substantially resolved if one could directly show 
that in the random case that this estimate is sharp. 

1.3. Proof of Talagrand's Theorem 

We follow the approach of V. Temlyakov [35] to the stronger inequality (1.1.4) in the 
case of d = 2, and invite the reader to compare this argument to the proof of Schmidt's 
Theorem in § 2.4. 

The decisive point in two dimensions is that one has a 'product rule.' Let us formalize 
it as this proposition, and leave the proof to the reader. 

1.3.1. Product Rule in Dimension 2. Let R,R' be two dyadic rectangles of the same area. 
Then, 

h R -h R , e {o, 1 R , h RnR ,}. 

More generally, let R\, R 2 , . . . , R; c be dyadic rectangles of equal area and distinct lengths in e. g. 
their first coordinates. Then 

k 

\h R .e [0,±h Rl 

7=1 

The proof of (1.1.4) is by duality. Fix 

\R\>2-" 



1.4. EXPONENTIAL MOMENTS 
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We will construct a function W with L 1 norm at most 1, for which the inner product 

(H,W) = 2- n - 1 

|R|=2-» 

This clearly implies the Theorem. Moreover, the function is defined as a Riesz product. 
Our Riesz product is 

n 

s=l 

ip s = Y sgn(a(R))h R 

R:|Ri|=2-MR 2 |=2-' !+s 

Of course is non-negative. Moreover, it has L 1 norm one: Expanding the product, the 
leading term is 1. All products of ip s are, by Proposition 1.3.1, a sum of Haar functions, 
hence have mean zero. 

The Proposition also implies that 

n 

s=l |R|=2"" 

The proof is complete. 

Remark. If one considers the case of a(R) = 1, it is clear that the L°° norm is achieved- 
or nearly achieved-on a set of measure approximately 2~ cn . That is, the supremum is 
achieved on a very thin set. Experience shows that Riesz products are very useful in such 
situations. 

Remark. Traditionally, a Riesz product is of the form 

CO 

T(l + cos4 fc x). 

fc=i 

By a well known heuristic, the functions cos behave as independent random variables, 
so we don't make a distinction between the classical Riesz product and the Riesz products of 
our proofs. Using Riesz products as above has a long history in the subject of irregularities 
of distributions. 

1.4. Exponential Moments 

We state a distributional estimate for sums of hyperbolic Haars which shapes the 
potential forms of approach to the Small Ball Conjecture. However, while the estimates 
we describe here are in general sharp, they admit certain improvements, for small p; see 
§1.8. 

Background on these issues are developed on Chapter 3. 
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1. THE SMALL BALL PROBLEM 



1.4.1. Theorem. In dimension d > 2 we have the estimate below, phrased in terms of the 
exponetial Orlicz Lebesgue spaces. 

(1.4.2, IIL"HU»vll[I> ( ^ 1,2 IL' 

|R|=2-» r |R|=2-" 

Remark. The estimates above, specialized to hyperbolic sums in dimension 3 or higher, 
are better than those that appear in the literature associated to the Discrepancy function. 

Here we are using a typical definition of the exponential integrability classes, as given 
in § 3.1. This definition could be for instance 

(1.4.3) ||X|| exp(Ltt) -supp- 1/a ||X|| p 

p>i 

The equivalence holding on any probability space. 

Of principal relevance to us is the three dimensional case, where the estimate above 
asserts that the hyperbolic sums are exponentially integrable. 

Proof. The tool is the vector valued Littlewood Paley inequality, with sharp rate of 
growth in the constants as p — > oo. As such the proof is a standard one, see [19,24]. 

Applying the one dimensional Littlewood Paley inequality in the coordinate %\ we see 
that 

| y a (R)h R \\ < vp||EJ Y a ( R M 2 ] 1 

|R|=2-» V r 1= l |R|=2-" V 

|Ril=2-'i 

If we are in dimension 2, note that 

(1.4.4) | Y a(R)h R f= Y \ a ( R )\ 2l R 

\R\=2~" \R\=2~" 
|R a |=2- r i |Ril=2" r i 

so our proof is complete in this case. 

In the higher dimensional case, the key point is to observe that the last term can be 
viewed as an i 1 space valued function. Then, the Hilbert space analog of the Littlewood 
Paley inequalities applies to the second coordinate, to give us 

\\Y "< R HL*HIEEl E ^mTIL 

|R|=2"" F n=l r 2 =l |R|=2-" ' 

|R ; |=2"''/ , 7=1,2 

Observe that we have a full power of p, due to the two applications of the Littlewood Paley 
inequalities. And if d = 3, then analog of (1.4.4) holds, completing the proof in this case. 

In the case of dimension d > 4 note that we can continue applying the Littlewood 
Paley inequalities inductively. They need only be used d — 1 times due to the hyperbolic 



1.5. DEFINITIONS AND INITIAL LEMMAS FOR DIMENSION THREE 
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assumption. Thus, we have the inequality 

| a ( R ) h 4 * p {d - i)/2 \\[ Yj a ( R ) 2i 4 /2 \\ > 2 < P < oo . 

|R|=2-» V |R|=2"" P 

The implied constant depends upon dimension; the main point we are interested in is 
the rate of of growth of the U norms. Assuming that the Square Function of the sum is 
bounded in L°°, the U norms can only grow at the rate of p^ -1 )/ 2 , which completes the 
proof. □ 

Remark. It is a thesis of A. Zygmund that when one is concerned with product domain 
questions, the relevant estimates are governed by the effective number of parameters 
involved. This thesis in the hyperbolic setting, says that relevant estimates should be 
those of d - 1 parameters in dimension d. We have just seen one instance of this. While it 
is known that this thesis does not hold in full generality, the hyperbolic setting is simple 
enough that it should hold for most, if not all, questions of interest. 

1.5. Definitions and Initial Lemmas for Dimension Three 

The principal difficulty in three and higher dimensions is that the product of Haar func- 
tions is not necessarily a Haar function. On this point, we have the following proposition 
which does not admit any essential extension. 

1.5.1. Proposition. Suppose that Ri,...,Rk are rectangles such that there is no choice of 
1 < j < f < k and no choice of coordinate 1 < t < d for which we have Rj /t = Rj' /t . Then, for a 
choice of sign e e {±1} zt>e have 

k k 
(1.5.2) \h R = eh s , S = P|R fc . 

7=1 7=1 

Proof. Expand the product as 

t e d 

] h Rm (Xi, ...,Xd) = Y[ £ m Y[ h R„,,,( X t) 
iu=l m=l f=l 

Here e m e {+1}. Our assumption is that for each t, there is exactly one choice of 1 < ra < I 
such that R,n ,t = S t . And moreover, since the minimum value of |jR OT/ f| is obtained exactly 
once, for m + m , we have that h Rmt is constant on S t . Thus, in the t coordinate, the product 
is 

£m h Sl (xt) £ m h Rmt (S t ). 

l<m£nio<C 

This proves our Lemma. □ 

Let re N rf be a partition of n, thus r = (r lr r 2 , r 3 ), where the rj are non negative integers 
and |f| := £ f r t = n. Denote all such vectors at H„. ('H' for 'hyperbolic.') For vector f let %? 
be all dyadic rectangles R such that for each coordinate k, \Ri\ = 2~'' k . 
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1. THE SMALL BALL PROBLEM 



1.5.3. Definition. We call a function / an r function with parameter r if 
(1.5.4) f=Y j e R h R , e R e{±l}. 

ReKf 

We will use f? to denote a generic r function. A fact used without further comment is that 

j? = i- 

1.5.5. Definition. For vectors r } e N 3 , say that r\,...,r*j are strongly distinct iff for 
coordinates 1 < t < 3 the integers {r,; f : 1 < j < J} are distinct. The product of strongly 
distinct r functions is also an r function. 

The r functions we are interested in are: 
(1.5.6) /-=2%gn(a(R))k R 

1.6. Jozsef Beck's Short Riesz Product 

Let us define relevant parameters by 

(1.6.1) q = an £ , b = \ 

(1.6.2) p~=aq b n~ 1 , p= yfqn" 1 . 

Here, a are small positive constants, we use the notation of b = 1/4 throughout, so as 
not to obscure those aspects of the argument that that dictate this choice of b. ~p is a 
'false' L 2 normalization for the sums we consider, while the larger term p is the 'true' L 2 
normalization. Our 'gain over the trivial estimate' in the Small Ball Conjecture is q h = n e ^. 
< e < 1 is a small constant. It certainly can't be more than 1/6 in view of (1.8.3) though 
there are other more severe restrictions on the size of £; the exact determination of what 
we could take e equal to in this proof doesn't seem to be worth calculating. 

In Beck's paper, the value of q = q^eck = i g log » was muc h smaller than our value 
of q. The point of this choice is that q^^t — n > with the term q q controlling many of the 
combinatorial issues concerning the expansion of the Riesz product. 4 With our substantially 
larger value of q, we need to introduce additional tools to control the combinatorics. These 
tools are 

• A Riesz product that will permit us to implement various conditional expectation 
arguments. 

• Attention to U estimates of various sums, and their growth rates in p. 

• Systematic use of the Littlewood Paley inequalities, with the sharp exponents in 

V- 



Specifically, q Cci is a naive bound for the number of admissible graphs, as defined in § 1.9. 



1.6. JOZSEF BECK'S SHORT RIESZ PRODUCT 15 

Divide the integers [1,2,..., n} into q disjoint intervals I q , and let A t := {r e H„ : 

ti e I t }. Let 



(1.6.3) F t = Yjf?- 



The Riesz product is now a 'short product.' 

<? 

W:= H(l + p~F t ). 

t=i 

Note the subtle way that the false L 2 normalization enters into the product. It means that 
the product is, with high probability, positive. And of course, for a positive function F, 
we have EF = ||F||i, with expectations being typically easier to estimate. This heuristic is 
made precise below. 

We need to decompose the product W into 

(1.6.4) W = 1 + W sd + W , 

where the two pieces are the 'strongly distinct' and 'not strongly distinct' pieces. To be 
specific, for integers 1 < u < q, let 



\<vi<-<v k <q r,eA llf f=l 

Esd _^ ^ , 

is taken to be over all k tuples of vectors {{r\, . ..,fk) s rif=i A Vt } such that: 

(1.6.5) the vectors {r t : 1 < t < k} are strongly distinct. 
Then define 

(1.6.6) W sd := "Vf 

With this definition, it is clear that we have 

(1.6.7) (H n , W sd ) = (H n , Wf) >q b -n\ 

so that q b is our 'gain over the trivial estimate.' 

The bulk of the proof is taken up with the proof of the technical estimates below. The 
main point of the Lemma is the last estimate, (1.6.14), which with (1.6.7) above proves 
Theorem 1.1.7. 

1.6.8. Lemma. We have these estimates: 

(1.6.9) F(W < 0) < exp(-A^- 2fc ) ; 

(1.6.10) imi 2 <exp(flV 6 ); 

(1.6.11) = 1 ; 
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(1.6.12) \m\i<l; 

(1.6.13) Wlh<l; 

(1.6.14) H^Hi ^ 1 . 

Here, < a' < 1, in (1.6.10), is a small constant, decreasing to zero as a in (1.6.1) goes to zero; and 
A > 1, in (1.6.9) is a large constant, tending to infinity as a in (1.6.1) goes to zero. 

Proof. We give the proof of the Lemma, assuming our main inequalities proved in the 
subsequent sections. In particular, the first two estimates of our Lemma are substantial, as 
they reflect the influence of the non trivial sub-gaussian estimates of § 1.8. 

Proof of (1.6.9). The main tool is the distributional estimate (1.8.3). Observe that 

<? 

F(W < 0) < F (p F t < -1) 
f=i 

= Y J n P F t <-a-w /2 - b ) 

t=i 

< q exp(ca~ 1 q 1 ~ 2b ) . 

Proof of (1.6.10). The proof of this is detailed enough that we postpone it to Lemma 1.8.5 
below. 

Proof of (1.6.11). Expand the product in the definition of W. The leading term is one. 
Every other term is a product 

keV 

where V is a non-empty subset of {1, ... , q\. This product is in turn a product of r functions. 
Among this product, the maximum in the first coordinate is unique. This fact tells us that 
the expectation of this product of r functions is zero. So the expectation of the product 
above is zero. The proof is complete. 

Proof of (1.6.12). We use the first two estimates of our Lemma. Observe that 

< 1 + 2W(W < 0) 1/2 ||^|| 2 

< 1 + exp(-A^ 1 - 2 V2 + a'q lh ) . 

We have taken b = 1/4 so that 1 - 2b = 2b. For sufficiently small a in (1.6.1), we will have 
A > a'. We see that (1.6.12) holds. 5 

Indeed, Lemma 1.8.5 proves a uniform estimate, namely 

sup E ]^[(1 + pF t ) 2 < exp(a'q 2b ) . 

Vc{l q] veV 



'Here of course we are strongly using the fact \F is positive with high probability. 
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Hence, the argument above proves 
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(1.6.15) 



sup I T(l+pF f ) 

Vc{l,...,q) U ^ 



< 1. 



Proof of (1.6.13). The primary facts are (1.6.15) and Theorem 1.9.1; we use the notation 
devised for that Theorem. 

Note that the Inclusion Exclusion principle gives us the identity 

= Yj (-l) m+1 Prod(NSD(y)) • (1 + pP t ) . 

Vc{\,...,q] te{l,...,q}-V 
\V\>2 

We use the triangle inequality, the estimates of Lemma 1.8.5, Holder's inequality, with 
indices 1 + l/q 2h and q 2h , and the estimate of (1.9.2) in the calculation below. Notice that 
we have 

(l + pF f ) < sup Ul + pFt) 

vcu qrvev 1+q Vc(l ,} N1 



veV 



x II \a + pF t ) 



q- 2b /(l+q- 2b ) 



veV 



< 1 



And recall that q 2b = q 1 ' 2 is a small power of n. So the U norms that we need on terms 
arising from NSD(F) below are for moderate values of p, namely we only need p < q 2b . 
This is a key reason why we can control the combinatorial explosion associated with our 
short Riesz product. 
We estimate 

imii< Yu ||Prod(NSD(V)) • (1 + pFt) 

te{l,...,q)-V 



Vc{l,...,q] 
\V\>2 



< £ ||Prod(NSD(V))||< 



Vc{l,...,q 
\V\>2 



(1 + pFt) 

te{l,...,q}-V 



l+q~ 



v=2 
C" 

q n 
< n E ' < 1 



< nC"„-V6 



Proof of (1.6.14). This follows from (1.6.13) and (1.6.12) and the identity W = l + ^ sd + ^" 
and the triangle inequality. 



□ 
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1.7. The Beck Gain in the Simplest Instance 

Beck considered sums of products of r functions that are not strongly distinct, and 
observed that the L 2 norm of the same are smaller than one would naively expect. This 
is what we call the Beck Gain. A product of r functions will not be strongly distinct if the 
product involves two or more vectors which agree in one or more coordinates. In this 
section, we study the sums of products of two r functions which are not strongly distinct. 
A later section, § 1.9, will study the general case. 

In this section, and again in § 1.9, we will use this notation. For a subset C c H*, let 

k 

(1.7.1) Prod(C):= £ \f f} . 

(n,.../ k )ec ;'=l 

In this section, we are exclusively interested in k = 2. 

Let C(2) c H 2 consist of all pairs of distinct r vectors {r\, r 2 } for which ri /2 = r 2/2 J. Beck 
calls such terms 'coincidences' and we will continue to use that term. We need norm 
estimates on the sums of products of such r vectors. 

1.7.2. Lemma. [The Simplest Instance of the Beck Gain.] We have these estimates for 
arbitrary subsets C c C(2) 

(1.7.3) ||Prod(C)|| p < p 5/4 n 7/4 . 

Moreover, if we have C = C(2) n A s x A t for some < s, t < q we have 

(1.7 -.4) ||Prod(C)|| p < p 3/2 tt 3/ V 1/2 ■ 

Finally, we have the estimate 

(1.7-5) II V f r f\ <p 3/2 n 3 V. 

ri=seA s 

We will use the second estimate of the Lemma, which we do not claim for arbitrary 
subsets of C(2). This estimate appears to be sharp, in that the collection <C(2) has three free 
parameters, and the estimates is in terms of n 3 ^ 2 . Note that for p - n we have 

||Prod(C 2 )||„-||Prod(C 2 )|U. 

And the latter term can be as big as n 3 , which matches the bound above. 

The proof of the Lemma requires we pass through an intermediary collection of four 

tuples of r vectors. Let B(4) c be four tuples of distinct vectors (r, s,t,u) for which (i) 
r\ = Si and f i = U\, and (ii) in the second and third coordinate two of the vectors agree. 

Proof. The method of proof is probably best explained by considering first the case of 
p = 2. Observe that 

||Prod(C)|| 2 = EProd(B), 
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where B = C X C n B(4). Indeed, the main point is that in order for 

•/*•/* •/**<) 

there is a coincidence among the four vectors in each coordinate. But this is the definition 
of B(4). Thus the case p = 2 follows immediately from Lemma 1.7.7. 

Now, let us consider 4 < p < n, as the inequalities we prove are trivial for p > n. Let 
K 3 / 2 be the best constant in the inequality 

N(p) := sup||Prod(C)|| p < Kp 3/2 n 3/2 . 

Here the supremum is over all choices of n and r functions. We give an a priori estimate of 
K3/2 . We define K7/4 similarly. 

Each pair (r, s) £ C must be distinct in the first and third coordinates. Therefore, we 
can apply the Littlewood Paley inequalities in these coordinates to estimate 



N(p):=||Prod(C)|| p <p||[^| £ frf\ ] 



2,1/2| 



a,b (r,s)eC 
max(ri,si)=« 
max\r 3 ,s 3 }=b 

Here, we have a full power of p, as we apply the Littlewood Paley inequalities twice. 
Observe that 

L| L frfs{ = ^+ Yj ProdCC^ + ProdCB™,). 

a,b (r,s)eC (*;e{l,2,3,4) 
max(ri,Si)=H 
max(r 3/ S3(=fc 

The term (jC arises from the diagonal of the square. The terms Cu are 

Q; : = {(fir A, A, ^4) G C x C : Ti = Tj, and the other two vectors are distinct} 
Note that by definition, Ci /2 = C 3/4 = 0. The term B max is 

B max := {(fir f, f , i^) e C x C : the maximum in the first and third coordinates occur twice} 
Then, we can estimate by the triangle inequality, and the sub-additivity of x 1— > yfx, 
||Prod(C)|| p < p(tfC) 1/2 + p IIProd(C y )||^ + p||Prod(B max )|| p 

(1.7.6) K/6{1A3 ; 4) 

< p 3 ' 2 + pn^Nip/lf 2 + p||Prod(B max )|| p . 

Using the estimate (1.7.8), we see that 

||Prod(C)|| p < pn 3/2 + pn y2 N(p/2) 1/2 + p 5/4 n 7/4 

+ pn 1/2 N(p/2) 1/2 
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This implies that 

K 7/4 < 1 + sup p- 5/4 n 7// > 1/2 (Kp 5/ V /4 ) 1/2 

2<p<n 

Clearly, this implies K7/4 < 1. 

Using the estimate (1.7.12), the proof that K 3 / 2 < 1 is entirely similar. 

□ 

Recall that B(4) c H 4 be four tuples of distinct vectors (r, s, t, it) for which (i) r\ = S\ and 
t 1 = Wi; and (ii) in the second and third coordinate two of the vectors agree. 

1.7.7. Lemma. For any subset B c B(4) 

(1.7.8) ||Prod(B)|| p < VpV /2 . 

Moreover, for B c B(4) n (A s x A f ) 2 ,/or any choice ofO < s + t < q,we have 

(1.7.9) ||Prod(B)|| p < c ^n 7/2 q~ 2 ■ 

If we do not consider arbitrary subsets, the estimates improve. We have the the estimates 

(1.7.10) ||Prod(B(4))|| p <pn\ 

(1.7.11) ||Prod(B(4) n (A s x A f ) 2 )|| p < pn 3 . 
Finally, define 

B max := {(n,? z ,r3,r 4 ) e B(4) n (A s x A f ) 2 : 

the maximum in second and third coordinates occur twice} 

Then, we have the estimate 

(1.7.12) ||Prod(B max )|| P <pn 3 . 

This Lemma, with exponents on n being n 7 ^ 2 appears in Beck's paper [4], in the case of 
p = 2. The V variants, following from consequences of Littlewood Paley inequalities, are 
important for us. 

The first group of estimates are recorded, as it is interesting that they apply to arbitrary 
subsets of B(4). We will rely upon the second group of estimates. Pointed out to us by 
Mihalis Kolountzakis, these estimates are better for all ranges of p < n. 

Proof. We discuss (1.7.8) explicitly, and note as we go the improvements needed to get 
the estimate (1.7.9). 

The proof is a case analysis, depending upon the number of {r, s, t, it} at which the 
maximums occur in the second and third coordinates. We proceed immediately to the 
cases. 

Let Bi c B consist of those four-tuples {r,s,t,u} for which 

r 2 = t 2 = max{r 2 , s 2 , f 2/ u 2 } , r 3 = t 3 = max{r 3 , s 3 , t 3 , u 3 ] . 
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This collection is empty, for necessarily we must have t\ = S\ = t\ = U\, but then r = s, as 
the parameters of all vectors is n. This violates the definition of B. 

Let B 3 c B consist of those four-tuples {f, s, t, it} for which 

r 2 = t 2 = max{r 2 , s 2/ 1 2/ u 2 } , r 3 = u 3 = max{r 3/ s 3/ t 3/ u 3 } . 

That is, the maximal values involve three distinct vectors. These four vectors can be 
depicted as 





' n * 








( h ] 




( h \ 


-> 

r = 


r 2 (n) 


-> 

, s = 


* 


-> 

, t = 




-* 

, u - 


□ 




I r 3 J 








{ ° J 







A □ denotes a parameter which is determined by other choices. It is essential to note that 
choices of t\ and r 3 determine the value of r 2 (hence the □ in the middle coordinate for r), 
and so the vector r. The only free parameters are (say) s 2 , denoted by an * above. 
But, note that we must then have \s\ = Si + s 2 + s 3 < n. Therefore this case is empty. 

Let B 4 be those four tuples four tuples \r, s, t, it} 6 B such that s 2 = t 2 and r 3 = u 3 . That 
is there are four vectors involved in the maximums of the second and third coordinates. 
These four vectors can be represented as 











( h > 




( h ) 


□ 


-> 

, s = 


s 2 


-> 

, t = 




-> 

, u = 


□ 






I ° J 




I ° J 







The next argument proves (1.7.8). Let B 4 (a, a', b) be those four tuples {r, s, t, it} e B such 
that 

?"i = Si = a , t\ = U\ = a' , s 2 , t 2 = b . 
The point to observe is that 

||Prod(B 4 (fl / fl' / &))|| p < CVp y/n. 

As there at most < n 3 choices for a, a' this proves the Lemma. (And, in the case of (1.7.9), 
there are at most n(n/q) 2 choices for these three parameters.) 

Indeed, we have not specified r 3 = u 3 . Since all vectors are distinct, a ^ a' , and in 
considering the norm above, we ignore s*and t, as they are completely specified by the 
datum (a, a' , b). The product f? ■ fg, in the second coordinate, is equal in distribution to 
a Rademacher function. And then the estimate above follows. The proof of (1.7.8) and 
(1.7.9) are finished. 

We turn to the proof of (1.7.10) and (1.7.11), arguing similarly. Let B 4 (fl,fl') be those 
four tuples {r, s, t, it} e B such that 

t\ = Si = a , t\ = U\ = a' . 

The point to observe is that 

||Prod(B 4 (fl / fl'))ll P <Cpn. 
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As there at most < n 2 choices for a, a' this proves the Lemma. (And, in the case of (1.7.11), 
there are at most (n/q) 2 choices for these two parameters.) 

The point is that Prod(B 4 (a, a')) splits into a product. Namely, 

Prod(B 4 (fl,fl')) = Prod(B 4/1 (fl,fl')) • Prod(B 4/2 (a,a')) 
Prod(B 4/ i(fl,fl')) := i{r,u} : there exists {s,t\ £ B 2 with {r,s,t,u\ £ B 4 (a,a')j, 
Prod(B 4/2 (a,a')) := {{s*, t\ : there exists {r,f\ £ B 2 with {r,s,t,u} £ B 4 (a,a')} . 
We estimate 

||Prod(B 4 (fl,fl'))ll P £ ||Prod(B 4/1 (fl,fl'))|| 2p • ||Prod(B 4/2 (a,a'))ll2 P • 

Both of the last two norms are at most < yfp ■ n 1 ^ 2 , which will finish the proof. 
That is the estimate is 

(1.7.14) ||Prod(B 4/1 (fl,fl'))|| 2p < Vp • n 1/2 . 

We may assume without loss of generality that a > a'. The pairs in Prod(B 4 i(a,a')) consist 
of the two vectors r and it in (1.7.13). These two vectors are parameterized by u 2/ say. 
Since a = r\ < a' = and r 3 = u 3 , the hyperbolic assumption implies w 2 is the maximal 
coordinate. Therefore, the Littlewood Paley inequality applies. 

The proof of (1.7.11) is exactly the same, just noting that a, a' can only take (n/q) 2 values 
in that case. 

We turn to the proof of the estimate (1.7.12). Here, it suffices to prove that 



(1.7.15) 



||Prod(B(4) n (As x A t ) 2 - B E 



< 



pn 



This last collection of four tuples of vectors can be further subdivided into finite number 
of collections, W., for 1 < j < 6 . Take B^ to be a subset of four tuples (r,s,t,u) £ 
B(4) n (As x A t ) 2 - B max with 



r 



r 3 



s = 



{ S 3 



t = 



( h 

h 
s 3 



u = 



1 si > 

h 
h 



Here we assume that Y\ is the unique maximal integer among [t\,Si,t\,U\}. Note that 
r and s*have a coincidence in the second coordinate; s,t have a coincidence in the first 
coordinate; and s, u have a coincidence in the third coordinate. The other collections B'. 
differ in the location of the maximums in either the first and third coordinates, and the 
particular patterns of coincidences. 

It is important to observe that we necessarily have Y\ > t\ > s 2 = u 2 . And we will 
apply the Littlewood Paley inequality in the r x and t\ variables. Clearly, we can apply the 
Littlewood Paley inequality in t\ to get the estimate 

||Prod(B' 1 )|| p < Vp|E Prod ( B, i( fl )) 2 ] 1/2 | 
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Here, Bi(a) is the collection of all four tuples {r, s,t,u\ e B[ with r\ = a. 

Next, we use the triangle inequality in the values of r 2 and s 3 . Note that with r 1 ,r 2 ,s 3 
specified, the values of r 3 and Si are then forced. Let B'^a, b, c) be the pairs of vectors {t, u\ 
for which there are vectors \r, s\ with {r, s,t,it\ 6 B^, with in addition 

n=a, r 2 = b, s 3 = b. 

By the triangle inequality, we can estimate 

||Prod(Bi)|l < VP^ 5/2 sup||Prod(B;(a,&,c))|| 

V a,b,c F 

Now, among the pairs of vectors in B^(a, b, c) have only one free parameter, which can be 
taken to be the maximum in the first coordinate. Thus, by the Littlewood Paley inequality 
we see that 

IIProdTOll^pn 3 . 

The analysis of the other possible forms of the collections B^ proceeds along similar lines. 
We omit the details. 

□ 

There is another corollary to the proof above required at a later stage of the proof. For 
an integer a, let B fl (4) c be four tuples of distinct vectors (r, s, t, it) for which (i) r\ = Si 
and t\ = U\, and (ii) in the second coordinate we have r 2 = t 2 = a; and (iii) two of the four 
vectors agree in the third coordinate. 

1.7.16. Lemma. For any integer a, and subset B c B„(4) we have 

(1.7.17) ||Prod(B)|| p < pn 5/2 . 

Moreover, for B fl c B(4) n (A s x A f ) 2 ,for any choice ofO<s^t<q,we have 

(1.7.18) ||Prod(B)|| p < cp n 5/2 q~ 2 ■ 

The point of this estimate is that we reduce the number of parameters of B(4) by one, 
and gain a full power of n in the size of the If norm, as compared to the estimate in (1.7.9). 

Proof. In the proof of Lemma 1.7.7, in the analysis of the terms B 3 and B 4 we used the 
triangle inequality over the term b = r 2 = t 2 . Treating this coordinate as fixed, we gain a 
term n" 1 in the previous proof, hence proving the Lemma above. The additional powers 
of q are obtained by using the fact that the first coordinates can only vary over a set of size 
- n/q. 

□ 

A further sub-case of the inequality (1.7.3) demands attention. Using the notation of 
Lemma 1.7.2, let 

(1.7.19) C 2> := {{A, r 2 ) e C 2 : r u = b] , \<a<n. 
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Thus, this collection consists of pairs of distinct vectors, with a coincidence in the second 
coordinate, and the first coordinate of f\ is fixed. Note that these collections of variables 
have two free parameters. At L 2 we find a 1/4 gain over the 'naive' estimate. 

1.7.20. Lemma. For any b and any subset C c C 2 ,b we have the estimates 

(1.7.21) ||Prod(C)|| p < p • n 5/4 , 2 < p < oo . 
Moreover, ifC c A s x A t ),for any choice of0<s^t<q,we have 

(1.7.22) ||Prod(C)|| p < p ■ n 5/ *q~ V2 , 2 < p < oo . 

Proof. As in the proof of Lemma 1.7.2, we begin with the case p = 2. Observer that 

||Prod(C)|| 2 = EProd(B), 

where B = x C2,& n Bb(4), with the last collection defined in Lemma 1.7.16. Therefore, 
the Lemma in this case follows from that Lemma. 

More generally, no pair of vectors in C2,b(2) can have a coincidence in the third coordi- 
nate, so we can use the Littlewood Paley inequalities in that coordinate to estimate 

nprod(c)n P < vp|ei e ^-^rni 



(rV 2 )eC 
max(r 1; 3,r 2 , 3 )=c 



Observe that 



(1.7.23) Yj E h-f4 = # C+ E Prod(Q, ; ) + Prod(B) . 

c (n/ 2 )eC i<je{l,2,3A) 
max(r 1/3/ ;-2, 3 |=c 

Similar to before, we define the collections Cu as follows. 

Cf := {(r\, r 2 , r 3 , r 4 ) e C x C : r i = r ] , and the other two vectors are distinct} 
In this case, observe that four of these collections are empty, namely 

Ci, 2 = C 2/3 = Ci, 4 = C 2 , 3 = C 2/4 • 

The only non-empty collection is Ci /3 . Yet, in Ci /3 , the vectors r 2 and r 4 have a coincidence 
in the second coordinate. Thus, Lemma 1.7.2 applies to C^, so that we have the estimate 

(1.7.24) UProd^Hp^p 5 ^ 4 . 

Let us prove (1.7.21). Combining these observations with (1.7.23) and Lemma 1.7.16 
we see that 

p- 1/2 ||Prod(C)|| p < n + UProdtd^li;; 2 + ||Prod(B)||^ 

<n + p 5/8 n 7/8 + p V2 n 5/4 . 
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Concerning the right hand side, note that for 2 < p < n 3 , we have p 5 / 8 n 7 / 8 < p 1 / 2 n 5 ^ 4 . Hence 
we have proved 

||Prod(C)|| p <pn 5/ \ l<p<n 3 . 

Yet, for p > n the U norm above is comparable to the L°° norm, so we have finished the 
proof of (1.7.21). 

The case of (1.7.22) is left to the reader. 

□ 

1.8. Norm Estimates Particular to the Hyperbolic Assumption 

The result of Theorem 1.4.1 admits an improvement, which we state in the a form 
adapted to our Riesz product. These improvements are subtle consequences of the detailed 
information we have about the Beck Gain. 

1.8.1. Theorem. Using the notation of (1.6.2) and (1.6.3), we have this estimate, valid for all 
l<t <q. 

(1.8.2) \\pF t \\ p < y/p , l<p<cn 113 . 
As a consequence, we have the distributional estimate 

(1.8.3) P(pGf > x) < exp(-cx 2 ) , x < cn 1/6 . 

Here < c < 1 is an absolute constant. 

Remark. It is perhaps worth emphasizing that we do not need this Theorem to deduce 
our main result, Theorem 1.1.7 on the Small Ball Conjecture in three dimensions. 6 Never- 
theless, we will use the result above. And we find the proof to be a compelling application 
of the Beck Gain. 

Remark. There are limits to validity to these kinds of inequalities: Recall that one has 
~ ^logN xhus, for appropriate F t we would have 

HpFtlloo - WpFthn - n/ V^- 

Hence, the sub-gaussian bound above can't hold for this range of p, unless q — n, but then 
the sub-gaussian estimate is immediate. 

Prooe Recall that 

reA, 

where A t := {r e H„ : r\ e k}, and I t in an interval of integers of length n/q, so that 
#A f - p 2 , with p defined in (1.6.2). 

6 If one does not use the result above, a smaller value of b = | is required. 
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Apply the Littlewood Paley inequality in the first coordinate. This results in the 
estimate 



\\pm v < vp||E|p L A ] 



2-.1/2I 



selj r:;'!=s 



£ VplH + r f ||^ 
* M 1 + "In) 

P^sfeAf 

n=si 

Of course the terms T t are controlled by the estimate in (1.7.4). In particular, we have 
(1.8.4) \\T t \\ p <Cp 3/2 n 1/2 . 
Hence (1.8.2) follows. 

The second distributional inequality is a well known consequence of the norm inequal- 
ity. Namely, one has the inequality below, valid for all x: 

P(pF f > x) < Cy /2 x~ p , 1 < p < cn 1/3 . 

If x is as in (1.8.3), we can take p — x 2 to prove the claimed exponential squared bound. □ 

Remark. The proof above does permit better than 'naive' estimates for ||pF t |L for a 
range of p > n 1 ^ 3 . The estimate we have is 

||pF t || p <min{p, VP(1+P 3/2 ^ 2 )}. 

The first estimate is from Theorem 1.4.1 while the second estimate is from the proof above. 
The minimum will be the second estimate provided p < ft 1 / 2 . Thus, for n 1 ^ 3 < x < n 1 ^ 2 one 
can achieve an estimate that is better than from that of Theorem 1.4.1. 

We now prove a central estimate of the proof. 

1.8.5. Lemma. The estimate (1.6.10) holds. Moreover, we have 

(1.8.6) sup E P|(l + pF t f < exp(a'q 2b ) . 

Vc{\,...,q} veV 

Here, pi is as in (1.6.2), and a' is a fixed constant times < a < 1, the small constant that enters 
into the definition ofp. 

Remark. A conditional expectation argument is essential to this proof. This Lemma is 
also proved in Beck's paper. Yet, due to a more complicated Riesz product, the use of our 
line of reasoning was not available to him. 
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Proof. The supremum over V will be an immediate consequence of the proof below, 
and so we don't address it specifically. 

Let us give the initial, essential observation. We expand 

1 1 

E + pF t ) 2 = E + 2pF t + (pF t ) 2 ) . 

Hold the x 2 and x 3 coordinates fixed, and let T be the sigma field generated by V\, . . . , F g _i . 
We have 

E(l + 2pF q + (pF q f 1 70 = 1 + H(pF q ) 2 1 T) 

= l+a 2 q 2b - 1 + p 1 r q , 
(1-8.7) H H 

where T t := ^ fr fa 

rtseA t 

n=si 

Then, we see that 

q q-1 

E Y[(l + 2pF t + (pF t f) = E{[](1 + 2pF t + (pF t ) 2 ) . x E(l + 2pF t + (pF t ) 2 | T)) 

v=l v=l 

(1.8.8) < (1 + a 2 q 2h - 1 )^ + 2pF t + (pf f ) 2 ) 

v=l 

1 

(1.8.9) + |E Y\(l + 2pF, + (pF f ) 2 ) • p%| 

This is the main observation: one should induct on (1.8.8), while treating the term in (1.8.9) 
as an error, as the 'Beck Gain' estimate (1.7.4) applies to it. 

Let us set up notation to implement this line of approach. Set 



N(V;r) :=|| ](l + pF f ) 



V = l,, 



We will use the trivial inequality available from the exponential moments 

v 

kv 



N(V-A)<Y[\\l + pF t 



,6-1/2 t/nV 



< (i + cq°- i/z vy 

This of course is a terrible estimate, but we now use interpolation, noting that 
(1.8.10) N(V;2(1 - 1/q)' 1 ) < N{V;2) l ~ llq ■ N{VA) llq ■ 
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We see that (1.8.8), (1.8.9) and (1.8.10) give us the inequality 

N(V + 1;2) < (1 + a^f^NiV-l) + C ■ N(V;2(1 - l/^)" 1 ) • \\^T q \\ q 

(1.8.11) < (1 + a 1 q lh - 1 ) ll2 N{V; 2) + CN(V; if' 11 * ■ N(V; 4) 1/<? ||p 2 r 9 ||g 

< (1 + a l tf b - 1 ) 1 f 2 N(V;2) + Cq c n- ll2 N{V;2) 1 -^ . 

In the last line we have used the the inequality (1.7.4). 

Of course we only apply this as long as N(V; 2) > 1. Assuming this is true for all V > 1, 
we see that 

N(q;2) < (1 + a 2 q 2b ~ 1 + Cq c n ll2 f 

< e a '? h . 

Here of course we need Cq c n~ l l 2 < aq 2h ~ l , which we certainly have for large ft. 

□ 

1.9. The Beck Gain 

Let us state the main result of this section. Given V c {1, . . . , q\ let 

NSD(^) := {{f y : ; e V) e x j e v Aj | for each j G V, there is a choice of f e. V - {/} 

and ^ = 2, 3 so that = ryj\ . 

That is, we take tuples of r vectors, indexed by V, requiring that each r ; be in a coincidence. 
Such sums admit a favorable estimate on their L 2 norms. 

1.9.1. Theorem. [The Beck Gain.] There are positive constants C , Ci, C 2 , C 3 , r/ for which 
we have the estimate 

(1.9.2) p |y| ||Prod(NSD(V))|| p < [C \V\ Cl p C2 q C3 n-'^ vl , V c {1, . . . , q] . 

Remark. The novelty in this estimate is that we find that (a) the gain is proportional 
to the number of vertices , and (b) the gain also holds in U norms. In application, 
P ^ q 2h = -\/q — n e / so the polynomial growth in p and in q is acceptable to us. Beck [4] 
found a gain in L 2 norm of order ft" 1 / 4 , for all V. Such a small gain of course forces a much 
shorter Riesz product. 

Remark. It is disappointing that we cannot identify a reasonable value of rj > 0, which 
is in large measure, the amount of the gain. Yet, the goal of this proof is to have a relatively 
simple method of proof. Obviously, a finer understanding of this estimate, among other 
issues, will be central to future progress on the range of questions discussed in these notes. 

The proof of this Theorem requires a careful analysis of the variety of ways that a 
product can fail to be strongly distinct. That is, we need to understand the variety of ways 
that coincidences can arise, and how coincidences can contribute to a smaller norm. 
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It is important at the outset to recognize that patterns of coincidences can be quite 
complex, a point best illustrated by a few examples of such patterns. Consider the specific 
product 

7 

d-9.3) nz^v 

7=1 TjeAj 

and the ways that summands in such a product could fail to be strongly distinct. One could 
consider those terms in which the first three choices of fj agree in the second coordinate: 

n, 2 = r 2/2 = r 3/2 , 

while imposing no restriction on the remaining four vectors r 4 , r 5 , r 6/ r 7 . Note that 

7=1 rjEAj L 7=1 rjeAj J L 7=4 ffiAj 

ri2=r22=r3,2 ^^2,2=^,2 

That is, we have a product of terms, with a 'simple' coincidence in the first term, and no 
restriction on the sum in the second. In this instance, we would take V = {1,2,3}. 
Similarly, a pattern of coincidences could be 

n, 2 = r 2/2 = r 3/2 , r 4/3 = r 5/3 = r 6/3 = r 7>3 . 

As in the first case, the corresponding sum would break into a product. And the L 1 norm 
would be substantially smaller, due to the presence of two sets of 'simple' coincidences. 
Yet, one could have a more complicated set of coincidences, such as 

n,2 = r 2 ,2 = r 3 ,2 , r lr3 = r 4/3 = r 5/3 , r 2 , 3 = r 6/3 = r 7/3 . 

Here, the first and second vectors are both involved in two distinct sets of coincidences. 
This case, as it turns out, are also substantially smaller in L 1 norm than the first case, due 
to the 'overlapping' coincidences. 

Following Beck, we will use the language of Graph Theory to describe these general 
patterns of coincidences. 

Before passing to the general description of these results, the reader should keep 
forefront in their minds these points: 

• Coincidences can only occur in the second and third coordinates, due to the specific 
way we form our products. 

• Our graphs will have as vertices the integers j e {1,2, ... ,q], the index of the 
product in (1.9.3). 

• Edges in the graph represent a coincidence between two vectors. Edges come 
in two different types, or colors, associated to coincidence in the second or third 
coordinates. 

• Equality is transitive, so the edges in e. g. the second coordinate will naturally 
decompose into cliques. 
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• As we work in three dimensions, a clique in the second coordinate, and a clique 
in the third coordinate can contain at most one common vertex, as two common 
vertices would imply that our product contains two equal vectors. This case is 
specifically excluded from our consideration. 

• The presence of an edge will mean that we enforce a coincidence of that type in the 
products we consider. The absence of an edge will mean that no such condition is 
assumed — not that equality is forbidden. This will permit product formulas such 
as (1.9.4) above hold. 

• A graph is naturally associated to sums of products of r functions. We seek effective 
LP norms on these sums. Larger cliques, and more overlapping cliques serve to 
reduce the number of parameters, and give smaller norms. 

Graph Theory Nomenclature. We adopt familiar nomenclature from Graph Theory. 7 
The class of graphs that we are interested satisfy particular properties. A graph G is the 
triple of (V(G), E 2 , E 3 ), of the vertex set V(G) c {1, . . . , q}, and edge sets E 2 and E 3 , of color 2 
and 3 respectively. Edge sets are are subsets of 

Ej c V(G) x V(G) - {(k,k) | k e V(G)} . 

Edges are symmetric, thus if (v, v') e Ey then necessarily (v',v) 6 Ej. 

A clique of color j is a maximal subset Q c V(G) such that for all v ^ v' e Q we have 
(v,v') e Ej. By maximality, we mean that no strictly larger set of vertices Q' D Q satisfies 
this condition. 

Call a graph G admissible iff 

• The edges sets, in both colors, decompose into a union of cliques. 

• Any two cliques Q 2 in color 2 and clique Q3 in color 3 can contain at most one 
common vertex. 

• Every vertex is in at least one clique. 

A graph G is connected iff for any two vertices in the graph, there is a path that connects 
them. A path in the graph G is a sequence of vertices v\,...,v\ with an edge of either color, 
spanning adjacent vertices , that is (Vj, Vj+i) e E 2 U E3. 

Reduction to Admissible Graphs. Given admissible graph G on vertices V, we set 
X(G) to be those tuples of r vectors 

{? v :veV}eY[A v 

veV 

so that if (v, v') is an edge of color / in G, then r Vi j = r v > r j. 

We will prove the Lemma below in the following two sections. 



There is no graph theoretical fact that we need, rather the use of this language is just a convenient way 
to do some bookkeeping. 
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1.9.5. Lemma. For an admissible graph G on vertices V we have the estimate below for positive, 
finite constants C , Q, C 2 , C 3 : 

(1.9.6) p m ||Prod(X(G))||i < [Co\V\ c y c ^ Ci n^f l , 2 < p < oo . 

Let us give the proof of Theorem 1.9.1 assuming this Lemma. Our tool is the Inclusion 
Exclusion Principle, but to apply it we need additional concepts. 

Given two admissible graphs Gi, G 2 on the same vertex set V, let Gi A G 2 be the smallest 
admissible graph which contains all the edges in G\ and in G 2 . By smallest, we mean the 
graph with the fewest number of edges; and such a graph may not be defined, in which case 
we take G\ A G 2 to be undefined. We recursively define G\ A • • • A Gk '■= (Gi A • • • Gk-i) A Gk- 
This wedge product is associative. 

Let Qq, be the set admissible graphs on V which are not of the form G\ A G 2 for admissible 
Gi, G 2 . These are the 'prime' graphs. (If V is of cardinality 2 or 3, every graph is prime.) 
For instance, in the case of V = {v\, v 2 , v 3 , u 4 } the two graphs below are prime. 

Vi V 2 V 3 V4 V\ V4 V 2 V 3 

□ □□□ 



The only difference between the two is the ordering of the vertices in the top row. There 
are no coincidences in the third row, and the first row, with the Ds, never has a coincidence. 
These two graphs are distinct, and clearly members of Qq. Note that their wedge product 
is 

V\ V 2 V 3 V/s, 

□ □ □ □ 

• = • = • = • 

Now define Qk to be those graphs which are equal to a wedge product G\ A • • • A Gk, 
with Gj G Qq, and moreover, k is the smallest integer for which this is true. Clearly, we 
only need to consider k < q. 

Then, by the inclusion exclusion principle, 

<? 

(1.9.7) Prod(NSD(y)) = J^(-l) fc ^ Prod(X(G)) . 

The number of admissible graphs on a set of vertices V is at most 2' y '|V|! < 2' y '|y|' y '. So 
that using (1.9.6) clearly implies Theorem 1.9.1. 

Norm Estimates for Admissible Graphs. We begin this section with a further reduc- 
tion to connected admissible graphs. Let us write G e BG(C , G\, C 2 , C 3 , t]) if the estimates 
(1.9.6) holds. ('BG' for 'Beck Gain.') We need to see that all admissible graphs are in 
BG(Cq, Ci, C 2 , C3, rj) for non-negative, finite choices of the relevant constants. 
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1.9.8. Lemma. Let C Q , C\, C 2 , C 3 , tj be non-negative constants. Suppose that G is an ad- 
missible graph, and that it can be written as a union subgraphs G\, . . . , Gk where all Gj e 
BG(C 0/ Ci, C 2/ C 3/ 7]). Then, 

GeBG(Co,C 1 ,C2,C 2 + C 3 ,i 1 ). 

With this Lemma, we will identify a small class of graphs for which we can verify 
the property (1.9.6) directly, and then appeal to this Lemma to deduce Theorem 1.9.1. 
Accordingly, we modify our notation. If Q is a class of graphs, we write Q c BG(7]) if there 
are constants C , Q, C 2 , C 3 such that ^ c BG(C , Ci, C 2 , C 3 , r\). 

Proof. We then have by Proposition 1.9.9 

k 

Prod(X(G)) = Y\ Prod(X(G ; )) . 

7=1 

Using Holder's inequality, we can estimate 

k 

||Prod(X(G))|| p < J^HProdCXCGy))!^ 

;'=i 

k 

< Yl[C (kpf^^n'n^ 

;'=i 

< [C p Cl q C2+Cl n-i]M _ 

Here, we use the fact that since the graphs are non-empty, we necessarily have k < q. 

□ 

1.9.9. Proposition. Let Gi,...,G p be admissible graphs on pairwise disjoint vertex sets 
V\, ...,Vp. Extend these graphs in the natural way to a graph G on the vertex set V = [j V t . Then, 
we have 

v 

Prod(X(G)) = Y[ Prod(X(G f )) . 

f=i 

Connected Graphs Have the Beck Gain. Let ^connected be the collection of of all admis- 
sible connected graphs on V c {1, ... , q}. 

1.9.10. Lemma. We have Connected c BG(^). 

One can depict small examples of these graphs as follows. 

□ □ □□□ □□□□ 

; • = • ; • = • • = • 

• = • • = • • = • 

These are graphs on 2, 3 and 4 vertices respectively. We will have to pay special attention 
to the case of 2 and 3 vertices, as these cases are not amenable to the general procedure 
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we invoke below. It is important to observe that the first coordinates, represented by □ 
above, are necessarily distinct, and have the partial order inherited from the vertex set V. 
Namely, the vertex set V c {1, . . . , q}, and V inherits the order from the integers. By the 
construction of our Riesz product, the first coordinates inherit this same order. 

Unfortunately, even working with this class of admissible graphs, our proof is of an 
ad hoc nature, and we won't actually specify a value of r\ > for which the Lemma above 
holds. 

General Remarks on Littlewood Paley Inequality. These remarks are essential to our analy- 
sis of this lemma, and the Theorem we are proving. The vertex set V is a subset of {1, ... , q} 
and it inherits an order from that set. Moreover, the tuples of r vectors do as well. Namely, 
writing 

V = {v x < ■ ■ ■ < v c ), 

for {r\, ... j(\ e X(G), we have, by construction, r\ t \ < ••• < r Ci \. This since r m/1 e I Vm , where 
I m i is the increasing sequence of intervals of length equal to n/q that partition {!,..., n\. 
Continuing this line of thought, we see that there is a natural way to apply the Little- 

wood Paley inequalities. For integer b( 6 It, let X(G; be) be the tuple of r vectors {r\, . . . , r ( ) 
such that r( i = bf. We have 



(1.9.11) ||Prod(X(G))|| p < Vp||[ Yj l Pr °d(X(G; b e ))\ 2 ] 



|Prod(X(G;M)rj 1/2 | 



It is tempting to continue this procedure, by applying the Littlewood Paley inequality 
again to the vertex Vc-\. Yet — and this in an important point — due to the nature of r 
functions, this option is blocked to us. The vertex V( is in at least one clique Q of, say, color 
2. We could choose a value Cq for that clique, thereby specifying all coordinates of the 
vector Y(. Set X(G; be; Cq) be the tuple of r vectors {r\, re-\\ such that 

{f x , ...,f t -\, (a e , b e , n-a ( -be))e X(G; a t ) . 

Here, X(G; be; c q ) consists of tuples of length €- 1, since the vector ft is completely specified. 
Thus, we see that 



(1.9.12) ||Prod(X(G))|| p < nsup||[2] ( Prod(X(G;^;c Q )) 2 ] 

a t' c Q b { 



1/2| 



At this point, the (Hilbert space) Littlewood Paley inequalities will again apply. 

We will refer to the notation above. Keep in mind that b is for the coordinates specified 
by a Littlewood Paley inequality; c are for the coordinates in a coincidence that we use the 
triangle inequality on. We shall return to these themes momentarily. 

Proof of Lemma 1.9.10. We begin the proof with a discussion of the case of two and 
three vertices , which will not be susceptible to the general methods related to the Little- 
wood Paley inequality outlined above. 
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The Case of Two Vertices . Notice that if G consists of only two vertices , the relevant 
estimate is (1.7.4). Namely, we have 

||Prod(X(G))|| p < Cf l2 n 3l \ l . 

Equivalent^ G e BG(C , 3/4,0, 1/4). 

The Case of Three Vertices. The case of G e Q 2 having three vertices depends critically on 
the same phenomena behind the Beck Gain for graphs on two vertices . We will deduce 
this case as a corollary to the case of two vertices . 

There are two distinct sub-cases. The more delicate of the two cases is as follows. The 
graph is depicted as 

V\ v 2 v 3 

(1.9.13) ° = ° ° 

• = • 

where V\ < v 2 < v 3 . (The case of v 2 < V\ < v 3 is entirely the same, and we don't discuss it 
directly.) 

By our general remarks on the Littlewood Paley inequality, this inequality applies in 
the first coordinate, to the vertex v 3 . Using the notation in (1.9.11), we have 

||Prod(X(G))|| p < Vp||[Xl Prod ( X ( G;fo 3))l 2 ] 1/2 || ■ 

hel V3 V 

The vectors v 2 and ^3 have a coincidence in the third coordinate. Therefore, we specify the 
value of the coincidence to be c 3 and estimate 

(1.9.14) ||Prod(X(G))|| p < VP ■ n 3/2 ■ sup||Prod(X(G; h; c 3 ))\\ P ■ 

Recall that X(G;03; b 3 ) consists only of pairs of vectors. This graph can be depicted as 

Vi v 2 
□ □ 

• = • 

But this is the case considered in (1.7.22). From that inequality, we see that we have the 
estimate 

||Prod(X(G;& 3 ;c3))llp< VP" 5/ V /2 • 
Therefore, from (1.9.14), we see that 

(1.9.15) ||Prod(X(G))|| p < p 3/2 n 9/ *q~ 3/2 . 

Recall that the point of comparison is to n 3 q~ 3 ^ 2 , and the estimate above is smaller by n" 1 ^ 4 . 
Thus the class of graphs given by (1.9.13) are contained in BG(^ - e). 
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The other case is when the graph can be depicted by 



Vi v 3 v 2 
□ □ □ 



where V3, the maximal index is in both cliques. This case is much easier, as one application 
of the Littlewood Paley inequality, and the triangle inequality will determine the value of 
both cliques. It is very easy to see that this class of graphs is in BG(l/6), and the details 
are omitted. Hence the discussion graphs on three vertices , with all cliques of size 2 is 
complete. 

A General Estimate. We now present a general recursive estimate for the U norm of 
Prod(X(G)), assuming that G is a connected graph on at least four vertices. Write V as 

V = \v\ < ■ ■ ■ < V{] . 

The estimate is obtained recursively. Along the way we will construct two disjoint 
subsets V3/2, V1/2 c V. V 3 / 2 will be the vertices to which we apply both the Littlewood 
Paley and triangle inequalities, thus these vertices contribute n z ^ 2 q~ x ^ 2 to our estimate. Vi/ 2 
will be the vertices to which we apply only the Littlewood Paley inequality, thus these 
vertices contribute (n/q) 112 to our estimate. Those vertices not in V 3 / 2 U V1/2 will be those 
which are determined by earlier steps in the procedure. They contribute nothing to our 
estimate. In estimating an U norm, the power of p is one-half of the number of applications 
of the Littlewood Paley inequality, namely |(t(V3/ 2 U Vi/ 2 ). 

The purpose of these considerations is to prove the estimate 

(1.9.16) ||Prod(X(G))|| p < (C ^f^^r^n/^m^v^iyi^v^ _ 

Initialize 

^3/2^0, Vi/ 2 <-0, <3fixed^0. 

The last collection consists of those cliques which are specified by earlier stages of the 
argument. 

At each stage, we will have an estimate for the form 
||Prod(X(G))|| p < (CVp) |y 3/2l+i^/2l n l^/ 2 l 

(1.9.17) 



x sup 

ce{l,...,n} Q feed 



Prod(X(G;&;c)) 2 ] 



1/21 



be{l,...,n} v 3n uv V2 



Here, X(G;b;c) denotes those tuples {r v : v e V} such that if v e U V1/2 then, r Vi \ - b v . 
And if v is in a clique Q 6 (3fi xe d of color f, then r V/t = Cq. 

Base Case of the Recursion. We update V 3/2 <— [V{), since it is the maximal element. We 
update £2fixed to those cliques which contain V(. Then (1.9.17) is a consequence of (1.9.12). 

Recursive Case. At this point, we have the datum V^/i, Vy 2 , and Qa xe d- We also have 

datum be {1, . . . , nY myjVia , and c e {!,..., n} Qlixed . Notice that this datum can completely 
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specify the r vectors associated to vertices not in V3/2 U Vi/ 2 — think of a vertex that is in 
two cliques in Qa xe d- 

The recursion stops if every vertex vy, is determined by this datum. Otherwise, let k to 
be the largest integer such that r Vk is not determined by this datum. If no clique in (3fi xe d 
contains v k update 

^3/2 <- ^3/2 u {v k } , 

and update (3fi xe d to include those cliques which contain v k . By application of the Littlewood 
Paley inequality and the triangle inequality the estimate (1.9.17) continues to hold for these 
updated values. 

If some clique in Qn xe d contains v k , then there can be exactly one clique Q Vk which does, 
for otherwise r Vk would be completely specified by these two cliques. Update 

Viiz <- V m U {v k } , 

and update Qa xe d to include all cliques which contain v k . By application of the Littlewood 
Paley inequality and the triangle inequality, the estimate (1.9.17) continues to hold for 
these updated values. 

Once the recursion stops the inequality (1.9.17) holds. But note that we necessarily 
have 

Prod(X(G;fc;c)) 2 = 1, 
-# 

as all r vectors are completely determined by b and c. Therefore, we have proven (1.9.16). 

The Conclusion of the Proof. Since V3/2 and Vyz are disjoint subsets of V, we have proven 
the inequality 

(1.9.18) p |y| ||Prod(X(G))|| p < (C y/pf\ n 2 lv ^ + b v ^-^ . 

And the remaining analysis concerns the exponent on n above, namely we should see that 



(1.9.19) 



ivr 1 ^ 1V3/2I + Jivx/zi 



W\ 



for a fixed positive choice of r\, and all connected graphs G on at least four vertices . We 
would conclude that this collection of graphs is in BG(?7). 

It would be helpful to consider a couple of simple cases. Consider the graph on five 
vertices 



(1.9.20) 



V\ 


04 


v 2 


-05 


-03 


3 
2 





3 
2 





3 
2 


• 


• 


• 


• 






• 


• 


• 


• 



Note that we specify a particular order on the vertices in the top row, and indicate the 
membership of each vertex in V 3 / 2/ V1/2, and in V$ := V - V 3 / 2 - Vi/i- Note that the zeros 
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at z?4 and V5 are forced. Consider the graph on six vertices 



(1.9.21) 



V\ 


v 6 


v 2 


-05 


-03 


V4 


3 





3 





3 


1 


2 


2 


2 


2 


• 


• 


• 


• 


• 


• 




• 


• 


• = 


• 





Here, there is one vertex in Vy 2 , but of course all vertices in Vy 2 contribute to the Beck 
Gain. But the reader should keep in mind that the graphs can in general have a much 
more complicated structure than these two linear examples. 

The extremal cases in the estimate (1.9.19) are those cases in which V 5 / 2 is as large as 
possible. To continue, we note another formula. Let E(G) be the total number of edges in 
the graph G, and let E(v) be the number of edges in G with one endpoint of the edge being 
v. 

For v € V3/2 U Vi/2, let F(v) be the number of edges which are specified upon the selection 
of that vertex in our recursive procedure. It is clear that we have E(v) = F(v) if v e V3/2. 
But also, 

£ m = e(g) . 

veV 3/ 2UV 1/2 

It follows that to maximize the cardinality of V3/2, those vertices must be in small cliques. 
There are two different classes of graphs which are extremal with respect to these criteria. 

The first extremal class consists of graphs G with all cliques being of size 2, and the 
number of cliques is |V| + 1, that is the graphs are like in (1.9.20) and (1.9.21). For such 
graphs, I V3/2I < r|l^ll/ an d if the value is maximal then Vy 2 is either if | V| is odd, and 1 if 
I V| is even. It is straight forward to see that the maximum of (1.9.19) occurs at | V\ = 5, and 
is — A. Here, it is vital that we have already discussed the case of two and three vertices ! 

The second class are graphs on an even number of vertices , with half the vertices in a 
clique Q, and each vertex v e Q is in one other clique of size 2. One can depict the graph as 

Vi v 2 v 3 V 4 v 5 v 6 

a b c a b c 

The vertices are written in increasing order: V\ < v 2 < v 3 < V4 < v 5 < v 6 . Note that ^1,^2,^3 
form a single clique of color 2. There are three additional cliques of size 2, all of color 3. 
They are {Vj, Vj +3 } for / = 1, 2, 3. For such a graph, it is clear that | V3/2I = \\V\, and \ Vy 2 \ = l. 8 
The term (1.9.19) behaves exactly like the first class of extermal graphs on an even number 
of verticies. Our proof is complete. 

□ 



If for example the maximal vertex V(, where in the clique of size 3, our algorithm then predicts a smaller 
estimate for the such a graph. 
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Irregularities of Distributions 

2.1. Discrepancy 

We outline the Discrepancy Theory, highlighting its relevance to the Small Ball Problem. 
In d dimensions, one takes Mn to be N points in the unit cube, and considers the function 

(2.1.1) D N (x) = n [0,x) - N\[6,x)\ 

Here, [0,x) = riy=i[0, that is a rectangle with antipodal corners being and x. We will 
typically suppress the dependence upon the selection of points &l N . A set of points will 
be well distributed if this function is small in some appropriate function space. Thus, it of 
interest to understand the 'min-max' function 

inf WDxWvaotf), 0<p<oo. 

For the purposes of this note, we will primarily be concerned with lower bounds for this 
quantity, with 1 < p < oo. Dimension will be held fixed, with N large. Many variants of this 
question are interesting; interested readers is encouraged to consult one of the excellent 
references in this area. 

It turns out that relevant norms of this function must tend to infinity, in dimensions 
2 and higher. Using the basic facts of the next section, we can prove the Theorem below, 
which concatenates results of Roth [25] in the case of p = 2. Indeed the proof we give 
below is the 'hyperbolic orthogonal function' method he initiated; and Schmidt [29] for 
other values of p. The end point estimate below is a consequence of the method, and don't 
seem to be as well known. 

2.1.2. Theorem. For any collection of points c [0, l] d , we have the estimates 

(2.1.3) ||D N || p >(logN) (d - 1)/2 

More particularly, we have the endpoint estimate 

(2-1.4) ||D N || L(logL)(rf - 1)/2 > (logNf-W 2 

Prooe As is usual, the proof is by duality, following Roth [25], and we use the Haar 
function approach of Schmidt [28]. 

We stick to the hyperbolic setting, with the rationale that extremal point distributions, 
whatever they might be, must have about one point in any rectangle of volume about 1~ n . 
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For each r 6 H„ construct the r function /?as in Proposition 2.3.1, and set 

reH„ 

By construction we have 

n^" 1 < (Djv, F) < ||D N || 2 ||F|| 2 < \\D N \\ 2 n^' 2 . 

This prove (2.1.3) in the case of p = 2, and by extension to all p > 2. To finish the proof, 
recall that L(logL)^" 1 ^ 2 and exp(L 2 ^ d_1 ') are dual spaces, see § 3.1. Thus, we we should 
observe that 

\\F\\ exp{L V^ } ^n^ /2 . 

But, the square function of F 

s ( f ) : =[Z^ 2 ] 1/2 - n 

rett„ 



(d-l)/2 



The last estimate is an L°° estimate. Therefore, by Theorem 1.4.1, we conclude that 
l|F|lex P (L2/(rf-i>) £ n (d " 1)/2 . This implies the L(logL) (rf " 1)/2 endpoint estimate for D N in (2.1.4). 

□ 

While this last Theorem is quite adequate for U , the endpoint cases of L°° and L 1 are 
not amenable to the same techniques, and the relevant fact is that the L°° bound should 
be larger. In dimension 2, the end point estimates are known. At L°°, it is the Theorem of 
Schmidt [28]. 

2.1.5. Schmidt's Theorem. We have the estimates below, valid for all collections j?i N c [0, l] 2 : 
(2.1.6) ||D N |U > logN. 

We shall see that this is a rather precise analog of Talagrand's theorem; the proof we 
give will share a great deal of similarity with the proof of Temlyakov we have described 
in § 1.3. 

Let us comment that there is an interpolant between the result of Schmidt and the U 
results, provided one uses the scale of exponential Orlicz classes. 

2.1.7. Theorem. We have the estimates below, valid for all collections c [0, l] 2 : 

(2.1.8) l|D N ||exp ( tf) > (logN) 1 " 1 ^, 2 < p < oo. 

In dimensions 3 and higher, there is the following improvement on J. Beck's result [4], 
due to Lacey and Bilyk [1]. 

2.1.9. Theorem. There is a choice of0<rj< \for which the following estimate holds for all 
collections <9\n c [0, l] 3 : 

(2.1.10) ||D N |L>(logN) 1+ '?. 
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Beck's result is as above, with (log N)' 1 replaced by a doubly logarithmic term. There 
is no further result known about the Small Ball Problem, nor the Discrepancy Function in 
higher dimensions. 1 

Halasz established the L 1 endpoint estimate for the Discrepancy function in two di- 
mensions. Namely 

2.1.11. Halasz' Theorem. For any collection of points <9i N c [0, l] 2 of cardinality N we have 
(2.1.12) HDwIl! > V^gN- 

While the L°° case is in close analogy to the Small Ball Conjecture, this analogy breaks 
down in this case. We will give Halasz' proof of this result, as well as a new one, which is 
again a duality method, but the construction of the dual function is not by way of a Riesz 
product. See § 2.6. 

In the reverse direction, concerning point distributions with small Discrepancy func- 
tion, the following is known. 

2.1.13. Theorem. In dimension d, there are point distributions with 

||D N || p <(logNf- 1)/2 , 0<p<oo. 

These constructions are delicate, and the product of significant effort over a period 
of decades. See especially Davenport [11], Roth [26,27], and Chen [14]. These earlier 
constructions were random in nature; recently Chen and Skriganov [12, 13] found subtle 
deterministic constructions. 

On the other hand, Schmidt's result is sharp, for Halton [21] has constructed point sets 
with Discrepancy function of L°° norm that matches his lower bound. 

2.1.14. Halton's Theorem. For dimension d > 2 there are point sets with 

||D N |L > (logN)^ 1 

2.2. Conjectures for Discrepancy 

The L°° Conjectures. In light of the close connection between the proof of the lower 
bounds in the L°° case and the Small Ball Conjecture, one suspects that an extra square 
root of n - logN is all that should be obtainable at the end point estimate at L°° for the 
discrepancy function. 

2.2.1. Hyperbolic Sup Norm Conjecture. For all choices ofN points we have 

(2.2.2) ||D N |L > (logN) rf/2 . 



The student of the literature will find an article published some years ago that claims an extension of 
Beck's result to higher dimensions. While this paper can serve as a useful summary of Beck's argument, an 
early critical Lemma in that paper is in error; a technique to repair the error is unknown to me. 
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What should be clear, in light of the sharpness of the Small Ball Conjecture, is that those 
who hold the conviction that this last conjecture falls short of the truth will necessarily 
seek a proof other than the hyperbolic one. 

2.2.3. Sharpness of the Hyperbolic Sup Norm Conjecture. We have the estimate 
(2.2.4) inf ||D N |L < (logN) rf/2 . 

'71 
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In this paper, we emphasize the similarity in proof techniques in the Small Ball Problem 
and the Discrepancy problems. It would be of interest to establish some formal connection 
between these two problems. 

The reader can consult the survey article by Temlyakov [37] for a discussion of the 
connection between the Discrepancy function in L°° and cubature formulas. 

One suspects that Theorem 2.1.7 is sharp. (Compare to [16].) 

2.2.5. Conjecture. In dimension 2, one has 

min||D N || exp(Ln) * (logN) 1 " 1 ^ , 2 < a < oo . 

The L 1 Conjecture. The other outstanding conjecture concerns the L 1 norm endpoint. 

2.2.6. L 1 Norm Conjecture. In any Dimension d one has the estimate 

||D N ||i>(logN) (d - 1)/2 . 

It appears that any improvement in the estimate (2.1.4), by e. g. replacing the logarithmic 
Orlicz space by one closer to L 1 , will generate an interesting new proof technique. 

The U Conjecture, for < p < 1. One can ask about the size of the Discrepancy 
Function in U , for < p < 1. The absence of duality methods has prevented any progress 
towards this conjecture. 

2.2.7. Conjecture. We have the estimate below, for allO < p < 1. 

\\D N \\ p >(\ogNf-^ 2 . 
Here, we indicate a result in this direction. 

2.2.8. Theorem. For < p < 1, and dimension d>2we have the estimate 

||MD N || p >(logN) (d - 1)/2 . 
Here, M denotes the strong maximal function in d dimensions, thus 

Mf(x) = sup l R (x)M(f\R). 

R dyadic 

Proof. We are uncertain as to how interesting this is, so our proof is somewhat abbre- 
viated. The only real observation to make is that the theory of multi-parameter Hardy 
space is relevant. See [8,9]. In particular, letting H p denote Hardy space, one has 

||/lb>-||M/|| p -||S(/)|| p , 0<p<l. 
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We apply this to D N . Let Q be the class of good rectangles, as defined in Proposition 2.3.1 . 
We then have 

||MD N || p -||S(D N )|| p >||[^l R ] 1/2 || 

Reg P 

It is an elementary exercise to see that the last term is > (logN) (rf_1)/2 . □ 

2.3. Elementary Propositions 

Throughout, we will specify n by 2N < 2" < 4N, so that n - log N. The value of n plays 
the same role in this section as it does in our discussion of the Small Ball Conjecture. In 
this section, we use the notation and definitions of § 1.5. 

Recall that / an r function if it is equal to 



Re«=. 



where £r g {-1,0,1}. Recall that %? consists of all dyadic rectangles R with \Rj\ = 2 r i for 
all coordinates /. 

2.3.1. Proposition. For each re H„, there is an r function fowith 

<D W/ //) > c d . 

Here Cd is a dimensional constant. 

Proof. There is a very elementary one dimensional fact: For all dyadic intervals I, 

(2.3.2) Exlj(x)fej(x) = ||I| 2 . 
This immediately implies that in any dimension 

Eh R (x)\[0 f x)\ =4" rf |R| 2 . 

We shall rely upon the construction of the this function f? below. Recall that JK^, the 
distribution of N points in the unit cube, is fixed. Call a cube R e %f good if R does not 
intersect M^, otherwise call it bad. Set 

(2.3.3) fr:= Yu hR+ s 8 n (( D N,h R ))h R . 

ReKf ReK? 
R is good R is bad 

Each bad rectangle contains at least one point in j?i N , and 2" > 2N, so there are at least 
N good rectangles. Moreover, since the counting function fjyi N Pi [0, x) is constant over 
each good rectangle, we have 

d 

(D N ,h R ) =NY[<Xj,h Rj ) = N2~ 2 "- 2d > 2~" 

7=1 
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Hence, we can estimate 

(D N ,f?) > Yj (VnM > 2~"${R efy'R is good} > 1 . 

ReKf 
R is good 

And so our proof is complete. □ 
Another proposition of a similar flavor is this. 

2.3.4. Proposition. Let fg be any r function with |s| > n. We have 

\(D N ,mzN2-®. 

Proof. This is a brute force proof. Consider the linear part of the Discrepancy function. 
By (2.3.2), we have 

d 

as claimed. 

Consider the part of the Discrepancy function that arises from the point set. Observe 
that for any point x in the point set, we have 

\(l [6A (x ) f M\<2-^. 

Indeed, of the different Haar functions that contribute to fg, there is at most one with non 
zero inner product with the function l^^(x ) as a function of x. It could only be the one 
rectangle which contains Xq in its interior. Thus the inequality above follows. Summing it 
over the N points in the point set finish the proof of the Proposition. 

□ 

A final, general proposition is relevant. 

2.3.5. Proposition. Fix a collection of r functions [f?\ re H„}. Fix swith |sj > n, and let 
3 <k<\s\—n+l. Let Count(s; k) be the number of ways to choose strongly distinct r\,..., r/ c e H„ 
so that n ; =i ffis an s function. We have 

(2.3.6) County k) < (\s\ - nf ■ k 3 ■ 

For k = 2 we have 

(2.3.7) Count(s;2)<|s1-n. 

Proof. This estimate is only of interest for |sj < dn, and is very crude. Fix s. We want 
to choose strongly distinct T\, . . . , e H )3 so that for all coordinate 1 < t < d we have 

max{r v ,...,r w } = s, c . 
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(Of course if the r k are not strongly distinct, the product need not be a r function.) Observe 
that for given s, there are at most < (|sj - n) d ~ l vectors r £ H„ with r t < s t for all coordinates 
t. 

Since the product is to be an r function with parameter we must have either two or 
three of the chosen r functions whose parameters are maximal, and equal to s. There are 
at most k 3 ways to select these r functions among the k terms were are forming the product 
over. And having selected them, there are at most (|s| - k) 2 ways to select these r functions. 
The remaining k — 3 r functions can be selected freely. This gives (2.3.6). 

The second estimate (2.3.7) is easier. 

□ 

2.4. Proof of Schmidt's Theorem 

We prove the Theorem of Schmidt; this section should be compared to § 1.3. With the 
r functions as constructed in the the proof of Proposition 2.3.1, we set 

W:= Y[(l + <*/?). 

Here, < a < |, and to be specific, we can choose a = 2" 6 . Clearly, this is a non negative 
function, with J W dx = 1. And so we should argue that 

(D N ,W)>n. 

Write the function as 

n 

WCP» reW 
$W=k 

where we understand that ip = l[o,ip. 

Clearly, (D N , ip ) = 0. By Proposition 2.3.1, we have 

(2.4.1) Y(D N ,afr)>an 

For this, recall that we are specializing to the case of dimension 2. 

We provide an upper bound on the remaining inner products (D N , i/^) for k > 2. 2 For a 
subset W c P n of cardinality at least 2. Then, the product 

reW 



Note that in the small ball problem, this set is not needed! 
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is again a sum of Haar functions, by the Product Rule! See Theorem 1.3.1. By Proposi- 
tion 2.3.4, 

\(D N ,l\f?)\<N2-^. 

few 

Now, for a fixed k and w with n + k < \w\ < 2n, we count the number of distinct ways of 
choosing W so that Ylfew ff is a w function. The first coordinates of the vectors r must be k 
distinct integers in the range 

n - W2 < r\ < W\. 

Moreover, there must be choices of r e W whose first coordinates are equal to either 
endpoint. There are clearly at most 



(2.4.2) 



l\w\ -n-2\ 
[ k-2 j 

choices of W. 

For an integer n < co < 2n, there are at most 2n vectors w with \w\ = co. Therefore, 

In 

\(D N ,4> k )\<2na k N ^ 



■j=n+k 



(co-n-2) 
k-2 



= na k N2~ n - k+1 V 



CO + k - 2\ 



a>=0 



k-2 



This must be summed over 2 < k < n. This sum is treated by two changes of variables. 
(One is v = co + k.) 



k=2 <u=0 * ' fc=0 <o=0 * 

n v 



11 V I \ 

<na 2 2- n - 1 NY J Yj ak2 \ k ) 

u=0 k=0 * ' 

n 

<n« 2 2-"- 1 N^2- I '(l+«r 



p=0 

< Ana 2 

For a sufficiently small, we see that this estimate is much smaller than the lower bound in 
(2.4.1), so that our proof is complete. 

The proof of Theorem 2.1.7 is a simple corollary to the proof above. Since < 2", 

it is clear that we have 

imhiogL)«<(iogNr. 
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Therefore, we can estimate for 2 < p < oo 
logN<<D N ,W)<||D N || exp(LP) ' 

2.5. Proof of Theorem 2.1.9 

We rely upon § 1.6. We see that for W d as defined in (1.6.6), that we have \\W S % < 1. 
Moreover, we have 

(2.5.1) (D N , Wf > > V" ^ an 1+e/4 

Here, g is defined as in (1.6.1), and < a < 1 is a small constant. Again, ^ b - rf^ is the 
'gain over the trivial estimate.' 

Consider the terms arising from \P? d . These are products of strongly distinct r vectors. 
Hence, we combine the estimates from Proposition 2.3.4 and Proposition 2.3.5 as follows. 
For k = 2 we have 

3n 

KD N/ ^f>|<£ Yu [^] 2 -N2-"- /! - County 2) 

h=l s:\<i\=n+h 
h 3/J 

"9 l 2 ~ 



^=1 

< = n e/2 



This is much smaller than the main term (2.5.1). 

We treat the terms arising from Wk for k > 3 as follows. 



fc=3 fe=fc v / 

h^-x /=n \ ^ / 



fc=3 fc=2 b=fc |s|=n+fe 



We have crudely estimated a term or two, and reversed the order of summation. Observe 
that q = n e is much smaller than n, so that we can estimate 
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It follows that 

<? acj b 3 2n 

£|<D N ,^ d >| < q 3 [^-] £ h 2 2~ h < q e ■ n~ 3 

k=3 h=3 

which is again much smaller than the main term (2.5.1). Our proof is complete. 

2.6. The L 1 bound in dimension 2 

We will indicate two proofs of Halasz' Theorem 2.1.11. The first is the proof of Halasz. 
Let f?be the r functions has in Proposition 2.3.1. Consider the Riesz product 

Here, < a < 1 is a small constant to be chosen. Because of the imposition of the imaginary 
number, it is is evident that this W is a bounded complex valued function. But one can 
argue that 

(D N ,lm(W))> yfc. 

much as the lines of the argument used to prove Schmidt's theorem. We omit the details. 

The second proof, is as far as the author knows, is new; as with Haasz' proof, it does 
not admit a straight forward extension to higher dimensions. We offer it as a technically 
interesting object, as the function we use is not a Riesz product, rather it is 

(2.6.1) 0:=sin(^V /,). 

As usual, < a < 1 is a sufficiently small constant. And we argue that (D;v, O) > y/n. 

Recall that the argument of the sine function above has exp(L 2 ) norm bound indepen- 
dent of n. Thus, as one may directly check, the Taylor expansion of <E> is convergent in all 
U . That is, we may expand 

and the sum is convergent in all LP, 1 < p < oo. A remarkable fact is that this infinite 
expansion is in fact a finite sum. To see this, let us observe the odd powers above have a 
simple closed form. 

2.6.3. Lemma. For integers k 

min(rc.2A:+l) , ... 



\f\=n v=\ 
v odd 
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where G v '.= \[f? m . 

r\,...,r v distinct w=l 

The last sum is over all distinct v tuples ofr vectors with \r\ = n. 

Proof. Only odd products of r functions can occur in the expanded product. Fix v odd, 
and distinct r vectors r\,...,f v . It suffices to count the number of ways this product can 
arise from the expanded product. But this is 



(2k + 1\ (2k + l-v) n(2k+1 _ v)/2 
\ v ) 2 V 



Indeed from the terms 

we choose v terms from which we take one of the pre-specified r functions f? fa. These 
products can be specified in one of v\ ways. 

In the remaining 2k + 1 - v terms, we divide them into groups of two. And select one 
of n r functions for each pair. This proves the Lemma. □ 



Expanding the Taylor series we see that 

(_l)0fc+i)/2 

IT 

k odd pfeH 



= £ (_if*W2 2 -3ir/2 £ 2 " 1 n" 1 G„ 

k odd v=\ 
v odd 

n 

(2.6.4) = c Y {-lf +l)l2 2- v n- vl2 G v . 



0=1 

v odd 



Here,c = (1 -4" 3 / 2 )" 1 . 

We turn our attention to the terms in (2.6.4). Now, by construction, we have 



<D N ,n- 1/2 G!> > n- 1/2 Y J (DNji) > n 1 ' 1 - y/to^N . 



reM„ 



As for the terms 3 < v < n, note that by Proposition 2.3.4, (2.4.2) and the definition of 
G-o, we have 



\(D N ,G V )\<N 2 



2n / ^ 

s-n-Y 



s=n+v-l 



v-2 
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And so we estimate as follows. Here is convenient that the sum is only over odd v > 

n n In i .. \ 

^2-^ 2 KD N/ G„)|<N^ £ 2 ~ SV T~ n -2 j 

v=3 v=3 s=n+v-l ' ' 

v odd 

2n s—n—1 

< Nn- 1 Y Yj 2~ s ~ v n~ v 

s=n+3 v=0 
In 

<Nn" 1 ^2- s (l + 1/ V") s_ " 



v odd v odd 

In s—n—1 

0/2 



s-n-V 

v I 



Since this estimate tends to zero with n, this proves our Theorem for sufficiently large 



CHAPTER 3 



Some Aspects of Harmonic Analysis 

3.1. Exponential Orlicz Classes 

Let ip : R — > R be a symmetric convex function with \p(x) = iff x = 0. Define the 
Orlicz norm 

(3.1.1) Il/H^ := inf{C > : Ei/>(//C) < 1} . 

We take the infimum of the empty set to be +oo, and denote by to be the collection of 
functions for which ||/||^ < oo. 

It is straight forward to see that ||-||^ is in fact a norm, with the triangle inequality 
following from Jensen's inequality. If ip(x) = x v , then ||-||^, is the usual U norm. 

We are especially interested in the class of xp given by 

^(x) = e w \ |x|>l. 

Here, we insist upon equality for |x| sufficiently large, depending upon x. We will write 
jj> a = eX p(L rt ). These are the exponential Orlicz classes. 

Especially important is the the case of a = 2, which is the class exp(L 2 ), of exponentially 
square integrable functions, of which the Gaussian random variables are a canonical 
example. A function / £ exp(L 2 ) is said to be sub-gaussian. 

Using Stirling's formula, and the Taylor expansion for e x , one can check that 

3.1.2. Proposition. We have the equivalence of norms 

ll/llexp(L«)-SUpp- 1/a ||/|| p 

p>l 

* supA-"logP(|/| > A). 

A>0 

One also has a familiar Lemma for the maximum of random variables. 

3.1.3. Lemma. Let X\, X N be random variables in of norm at most one. Then, we have 

Esup|X N | < ^(N). 

n<N 

So for Xi, . . . , Xjv e exp(L 2 ) of norm one, we have 
(3.1.4) Esup|X N | < VlogN + 1. 

n<N 

51 
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Indeed, we will leave to the reader to verify that under the assumptions above 
(3.1.5) ||sup|X N | || exp(L2) < VlogN + 1. 

n<N 

Proof. By Jensen's inequality 

i/>(Esup|X N |) < Esupi/>(|X N |) 

n<N n<N 
N 



<£Ei//(|X N |) 



n=l 

<N. 

The proof is complete. □ 
Another class of relevant spaces are given by the convex functions 

<p p (x) := |x|(log2 + |x|). 

We denote IP$ = L(logL)^. The connection with the exponential Orlicz classes is by way 
of duality. 

(3.1.6) [exp(L")r = L(logL) 1/B . 

These spaces are closely associated with the extrapolation principle. 
3.1.7. Proposition. Let T be a linear operator with 

(3.1.8) UT|| Lfm iFHL P( [o,in £(P-I) a , Kp<2,0<«<l. 
We then have the inequality 

(3.1.9) \\Tf\y < \\f\\nio g L)° ■ 
More generally, 

(3.1.10) IIT/lboogty ^ WfWuQogvr* , o < p < oo . 

Proof. Let us consider (3.1.9). This inequality is dual to 

l|T7llexp(L)^ <||/|U. 

But, taking / e L°°, with WfWoo = 1, and using (3.1.8), we have for 2 < p < oo, 

\\Tf\\ p <p a 

and so the dual estimate follows Proposition 3.1.2. The inequality (3.1.10) is entirely 
similar. □ 
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3.2. Khintchine Inequalities 

The utility of the exponential Orlicz classes is that they allow a concise expression of a 
range of inequalities. This is especially relevant to the classical Khintchine Inequalities. In 
other instances we shall see, that Orlicz spaces express sharp inequalities forms of different 
inequalities. 

Let {r k : k > 1} be independent, identically distributed random variables, with P(ri = 
1) = P(ri = -1) = j. Such random variables are referred to as Rademacher random 
variables. They admit different realizations, of which the most direct is 

r k = sgn(sin(2 fc rat)) , < x < 1 . 
Such random variables are in particular orthogonal, so that we have 

k k 

This holds for all finite sequences of constants {a k }. 

The Khintchine Inequality says that these sums, in all U , are controlled by the L 2 norms. 
In its sharp form, this inequality states 

3.2.1. Khintchine Inequalities. For all finite sequences of constants {%} 

< 3 - 2 - 2 > IIL<HU, £ [I>*]" 2 ' 

k FV ' k 

Proof. The classical proof of this is quite elementary, passing through the Moment 
Generating Function. We can restrict attention to the case where 

k 

Consider the moment generating function, given by 

<p(A) = Ee AEtVt , A>0 
= Y\ E e Aakn 

k 

= \\l(e- Aak + e Aak ) 

k 

n 



< I I e A2a l 



< e 



Here, we have relied statistical independence of the random variables. In particular, if 
X, Y are independent random variables, then 

EX • Y = EX • EY . 
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We have also used the the elementary inequality 

(3.2.3) I(e-f< + e f') = ^!^<e''\ ftG R. 

Now estimate 

PQ^ a k r k >t)< <p(A) e~ At < e Al ~ At , A > . 

k 

The minimum over A > of the right hand side occurs at A = t/2, giving us the estimate 

P(2>r, > f) < e"^ 4 . 

k 

In view of the symmetry of the Rademacher random variables and Proposition 3.1.2, this 
proves the Theorem. 

□ 

3.3. Maximal Function Estimates 

While our primary interest is in the Littlewood Paley Theory, the maximal function 
and its relevant estimates are essential to the subject. 
Define 

(3.3.1) M/(x) = supE(/|I). 

xel 
IeV 

The principal properties of the Maximal function are 
3.3.2. Theorem. We have the estimates 

supAP(M/>A)<||/|| 1 
(3 3 3) A 

||M/|| p <(l + l/(p-l))||/|| p , Kp<oo. 

The left hand side of the first inequality is referred to as the weak L 1 norm, and we write 
it as ||M/||i /0 o. More generally, we define 

(3.3.4) ||/|| P/00 = supA- 1 P(l/l>A) 1/ P. 

A>0 

As with the Orlicz norms, in certain instances these norms define sharp inequalities. 

Proof of Theorem 3.3.2. This is especially easy as we are working with the dyadic 
maximal function, this is especially easy. We begin with the weak type inequality. 

Fix A > 0, and let A be the collection of maximal dyadic intervals with E(/ 1 1) > A. By 
maximality these intervals are disjoint, so 

AP(M / > A) = E/1j < Ef < H/lli . 

IeA 
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For the proof of the remaining inequalities, one interpolates with the obvious L°° bound, 
as is described in Stein and Weiss [33]. 

□ 

The norm estimate we give above, as p I 1 is sharp, which extrapolates to this estimate 
3.3.5. Theorem. We have the estimate 
(3.3.6) IIM/lli^H/lbogL. 

Prooe This nearly follows from Proposition 3.1.7, but M is not a linear operator. Yet, 
the bound for the maximal operator in Theorem 3.3.2 is equivalent to the same bound for 
the family of linear operators 

where {E(f) : I 6 T)\ is a family of pairwise disjoint sets with E(I) C I for all I. (For a given 
/, one takes E(I) to be the set of x £ I for which the supremum in the definition of M is 
achieved at I.) 

These operators, being linear, satisfy the estimate (3.3.6), by Proposition 3.1.7. Therefore 
the Lemma follows. □ 

There is a striking converse to this last Theorem, 

3.3.7. Theorem. [E. M. Stein] IF M|/| e L, then we have f e L log L. 

Proof. We can assume that / > 0. Let us first show that 

(3.3.8) A^E/I^a) < P(M / > A), A > Mf . 

Indeed, let I be the collection of maximal dyadic intervals with {Mf > A} = \J Ie j I. Then, 
if x {M/ > A}, we must have f(x) < A by the Martingale Convergence Theorem. In 
addition, A > E/, so no I e I can be maximal. That implies that E(/ 1 1) < IX. But then, 



E/l^^E/1 

lei 

2a£|I| 



< _ 

'lei 

= 2AP(M/ > A). 



Hence, we can estimate 



I A^E/lf/^} dA < I P(M/>A)rfA, 

JEf JlEf 



and our conclusion follows easily from this. □ 
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3.4. Littlewood Paley Theory 

We consider the Haar basis on [0, 1], given by {l[o,i]} U {hi : I e £)}, where we remind 
the reader that D consists of the dyadic intervals in [0,1]. We also remind the reader that 
the Haar functions are normalized to have L°° norm one, so that our formulas are different 
from most of our references. 

It is important to our applications that we consider the Haar basis as one for vector 
valued functions. The vector space should be a Hilbert space and by we mean the 
class of measurable functions / : [0, 1] — > such that 

< 00 • 

The Haar Square Function is 

Kf,h,)\\iV2 



S(/):=[| E /|> + £^1,] 



IeV 



Here, we are taking the Hilbert space norm of those terms that involve /. Of course we 
have H/H2 = ||S(/)||2 just by the fact that the Haar basis is an orthogonal basis. 

The Littlewood Paley Inequalities are a profound extension of this equality, to an 
approximate version that holds on all U , 1 < p < 00. 



3.4.1. Littlewood Paley Inequalities. For 1 < p < 00 there are absolute constants < 

P<B P I|S(/)|| P , K V <oo 



A p < B p < 00 so that 



<3 ' 4 ' 2 > B P < 1+ ^. 
In the reverse direction, we have 

A P ||S(/)|| P <||/|| P , l<p<oo 

(3.4.3) , 

A p -1 + 1/ VF 7 !- 

We stress that these results are delicate. Burkholder [6] has shown that the best constants 
in the inequality above for general martingales are A" 1 = B p = max{p, q] - 1. However, a 
Haar series is not a general martingale; it is dyadic, which forces conditional symmetry. 
See [38]. 

The constants above are sharp. To see that B p - yfp is sharp for p large, just use 
the Central Limit Theorem for Rademacher random variables, or the sharpness of the 
Khintchine Inequality. A duality argument shows that one can take A p = B~} , where p' is 
the conjugate index to p. 

The inequality (3.4.2) holds for < p < 2, but we do not need that case, so don't discuss 

it. 
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Duality Principle. With the Littlewood Paley Inequalities, there is an important duality 
principle which permits us to pass from one inequality to another. Let us see that we can 
take 

(3-4.4) B p =A~\ 1 + 1 = 1. 

Assume the inequality A t? ||S(g)|| tJ < \\g\\ q . Fix / £ U, and choose g 6 D of norm one so 
that we have 

\\f\\r = (f,g) 

= Bf-< H Eg + Y 4 <f,hi) -<h {gM 

< IIS(/)|| P • ||S(g)||, 
<A-^{f)\\ v 

So (3.4.4) holds. 

The Chang Wilson Wolff Inequality. A key step in the proof of this inequality is to 
first prove the Chang Wilson Wolff inequality, [10]. 

3.4.5. Chang Wilson Wolff Inequality. We have the estimate below for Hilbert space valued 

/■ 

(3-4.6) ll/llexp(L 2 )<l|S(/)|U. 

Proof. It is immediately clear that if we knew ||/|| p < -\/p||S(/)||p for p > 2, in the Hilbert 
space valued case, then the inequality (3.4.6) would follow. 

Our strategy is to first prove the inequality (3.4.6) in the case that the function / is real 
valued. From this, we will deduce a quadratic inequality, which will prove the Littlewood 
Paley inequalities for large p, in the Hilbert space valued case. This will complete the proof 
of the Chang Wilson Wolff inequality as we have stated it. 

We give the proof of Chang Wilson and Wolff, in the real valued case, which they learned 
from Herman Rubin. Indeed, this proof can be regarded as the conditional version of the 
proof we have already given of the Khintchine inequalities. 

Let us recall that a sequence of functions g\, . . . , form a martingale iff for all sequences 

E(g„ + ilgi,...,g„) = g n - 

Here, we are taking the conditional expectation of g n+ \ with respect to the sigma field 
generated by gi,...,g„. 

Let T n be the sigma field generated by the dyadic intervals of length 2~", so that 

(f,h) 

f n :=Hf\r n ) = Ef+ £ ^-iifc 

\I\>2-" ' ' 

is a dyadic martingale. We assume that E/ = 0. 
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For t > we define a new martingale by the formula 

q n :=e^[] jE(e^ +1 ~^ |^)] . 

Of course, it is hardly obvious that g„ is a martingale, and so we check this now. Clearly, 
q n is Tn measurable. We should then check that E(^ n+1 1 Tn) = q n - 

" -l 
E(q n+1 \T n ) = E(e^" +1 [] \ E{e t{fi+1 ' fi) I Tj)] \ r„) 

;=i 

= E(e^» +1 \T n ) • [ ] E(e f <W>> | ^)] 

- -l 
= E(e t(/ " +1 - /H) | r„) • e f/ " •[] ] E(e f ^ +1 ^ | f-)] 

" _1 -l 
•[] jE(e t( ^-^|^)] =q n - 



And therefore, E^„ = 1 for all n. 

The fact that we work with a dyadic martingale enters. For we can appeal to (3.2.3) to 
see that 

j Z J E (e f (//«-//) \Tj) <f[E(ffV»-f? \Tj) = j^e^-^ 2 = . 

;=1 ;=1 ;=1 

Therefore, under the assumption that ||S(/)||oo < 1, we see that 

Ee f/ "" f2 < Eq n = 1. 
As this holds for all n, we can take n — > oo. Therefore, we have for A > 0, 

P(/ > A) < e" fA Ee^ < e" tA+f2 . 

Taking t = A/2 proves the Chang Wilson Wolff inequality in the case that / is real valued. 

□ 

Proof of the Littlewood Paley Inequalities. The first step is to derive a 'Good A 
Inequality,' as below. This exotic looking inequality, first devised in [7], has proven to be 
a very powerful technique. 

3.4.7. Good A Inequality. For A > we have the inequality 

(3.4.8) P(M / > 2A ; S(/) < eA) < e~ C£ ~ 2 P(M / > A), < e < \ . 

Here Mf is the dyadic maximal function, and < c < 1 is an absolute constant. The point of the 
estimate is that it holds for all < e < |, with the constant on the right tending to zero as e 10. 
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Proof. Define a stopping time by 

n 

x = min{n : ^JJi ~ //-i) 2 ^ eA ] ■ 

7=1 

As is usual, the minimum of the empty set will be taken to be +00. 
Let fj = P(7) _1 E/lj be the average value of / on I. 
Let Q be the maximal dyadic intervals with fi > AP(J), so that 

{M/>A} = |Jj. 

IeQ 

On each I the event Ej := I n {M/ > 2A ; S(/) < eA}. This is the main point: If Ej is non- 
empty then E/l/ < (1 + e)AP(f). Indeed, let F denote the dyadic interval which contains it 
and is twice as long. So the average value of / on F is less than A. If our claim is not true, 
then 

\{fM\>eXP(I), 

contradicting Ej being non empty. 
Now observe that 

P(Ej) = P(M/ > 2A ; t = 00) < P(M(/ T - fi) > (1 - e)A) . 

Moreover, ||S(/ T )||oo < eA. Therefore, by the Chang Wilson Wolff inequality applied to the 
renormalized martingale f T - fj, 

P(Ei) < e~ ce ' 2 P(J) . 

By summing over IeQ we complete the proof. □ 

There is a standard way to pass from the Good A Inequalities to norm inequalities, 
illustrated by this computation. Since |/| < M/, it suffices to prove the estimate ||M/|| P < 
Bp I |S(/) ||p. First observe that 

P(M/ > 2A) < P(S(/) < eA) + P(M/ > 2A ; S(/) < eA) 
< P(S(/) <eA) + C e~ C£ ~ 2 F(Mf > A) . 

Then, we can compute 



\MJ%=p2 p I A P_1 P(M/ > 2A) dA 
Jo 

<p2P I P(S(/) <eA)dA + p2fCe- C£ ' 2 I A P_1 P(M/ > A) dA 

Jo Jo 



<(2/ e )H|S(/)||J+p2PCe- ce " \\Mf\f p . 
Observe that if we take e - p -1 ^ 2 , we can conclude 

l|M/||J<(CVp) p l|S(/)HE 
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which proves the desired inequality. 

To recap, we have proved (3.4.2) in the range 1 <p < oo for real valued functions /. By 
the duality principle, this proves (3.4.3) in the same range. 

To deduce the stronger result, for Hilbert space valued functions /, we need a different 
formulation of the Chang Wilson Wolff inequality. Fefferman and Pipher [19] have devised 
an elegant proof, inspired by the work of Wilson [39]. Also see [24] 

3.4.9. Definition. For 1 < p < oo, a function w > on [0, 1], say that it is in dyadic A p if 
(3.4.10) |MU p := supl/r^lwl/Hp • 1 1 < oo . 

IeO 

We are especially interested in the endpoint cases. To be explicit, these are 

IMU a := su-p|J| _1 ||zflj||i • [inf zf(x)]" 1 < oo 

IMU^ := supsupw(x) • |7| — 1 — ^^l/Hi < oo 

IeO xel 

The functions w > are 'weights' that we use to construct U{w) spaces, with norm 
= E/ p • w. By an abuse of notation, we will write this last expectation as 

lE w f := E/ • w . 

Likewise P JU (A) = E^l^. The result we are interested in is: 

3.4.11. Theorem. We have the inequality 

(3.4.12) \\f\\ LHw) < |M|^ 2 ||S(/)|| L2(l[;) 

This holds for all Hilbert space valued f. 

There are two key observations about this Theorem. First, the estimate is quadratic 
in nature, a key reason for passing to this level of generality. In particular, in order to 
establish this Hilbert space valued /, it suffices to establish it for real valued /. Indeed, if / 
takes values in a Hilbert space, then we can assume that the Hilbert space is t 2 , and write 
f = (fk '■ k £ IN). Assuming the real valued version, we can just sum on k. 

So the Hilbert space case is immediate. 

Second, the dependence in terms of the A 1 constant is sharp, which permits the de- 
duction of the sharp growth rate in V constants, for p > 2. This is a standard argument, 
following Rubio de Francia. For p > 2, write 

< E/ 2 • (p 
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for some non-negative cp with ||<p||( p /2)' = 1. We dominate <p by an A 1 weight, which is given 
as follows. 

00 

(3.4.13) v:=Y,(MW)) k M k <P- 

k=0 

In this display, M denotes the dyadic maximal function, and M ,c denotes the A:th power 
of M. We interpret the Oth power to be the identity. The constant p(q) is the norm of the 
Maximal Function on U. The relevant fact for us here is that p(q) - (q - l) -1 as q J, 1. In 
particular, p((p/2)') - p as p — > oo. It is clear that |M|( p /2)' ^ 1- 
Now v satisfies \\v\\ A i < p, since for any dyadic interval I 

oo 

k=0 

OO 

< 2p((p/2)')Y {2p{{pl2)'))HniM k cp 

J—i xel 

k=l 

< p inf v(x) . 

xel 

But then, we have 

\\f% < E,/ 2 

<\\v\\AS{ff 
<pE,S(/) 2 

Zp\\S{f)\\ 2 p - 

So the Littlewood Paley estimates holds for all p > 2, in the Hilbert space valued case. 

Proof of Theorem 3.4.11. We need an additional result on the way in which A 1 weights 
embed in A°° weights. 

3.4.14. Lemma. [Lemma 3.6, [19].] Given < r\ < 1, there is a C > so that for all w e A 1 
and sets Eel where I is dyadic, we have 

P(E 1 1) < e - qN Ui implies F W (E \I)<r] 

Proof. As we work on a probability space, we have the Holder inequality 

E|/|<||/|| p , Kp<oo, 

as well as the Orlicz variants, E|/| < ||/||LiiogL- It is a key attribute of the weighted theory 
that one can reverse some of these inequalities for weights w £ A p . In the case of A\ the 
reverse Holder inequality is 

1 1 w 1 1 Z. 1 (log L)(J;rfx/ |I| ) ^ IMUi|M| L i( I;rfj ./|j|) . 

This follows immediately from Theorem 3.3.7 and the definition of A 1 . 
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But then we can estimate 

E(lVl E \I) < ||w|| L i(i g L )(j ;c?z /|7|)||l E ||exp(L(7;rfx/|I|)) 

£ IMUilMbcww) log P(£ 1 1)' 1 

£ r]\\M\LHl;dx/\J\) 

This proves our Lemma. 

□ 

Recall the Chang Wilson Wolff good A inequality 

P(M/ > 2A ; S(/) < eA) < e~ ce ~ 2 P(M/ > A), < e < \ . 

Taking e IMI"^ 2 , we can deduce the weighted good A inequality 

F w (Mf > 2A ; S(/) < eA) < nF ( Mf > A) . 

And the standard way to prove the L 2 estimate from the good A inequality gives us the 
inequality 

||/l|2L 2 (w).^£3e- 2 ||SCOIh«ll2|Ui||SCOIh, 

and so the proof is done. 

Weak L 1 Estimate. At L 1 , the equivalence ||/||i ^ ||S(/)||i fails. 1 Nevertheless, there is 
an endpoint estimate of interest to us. It is 

3.4.15. Weak L 1 Bound for the Square Function. We have the inequality 

(3.4.16) supAP(S(/)>A)<||/|| 1 . 

A>0 

We stress that this inequality holds for Hilbert space valued functions f. 

Remark. Traditional approaches to these issues treat the weak L 1 estimate first, and 
then interpolate to U . We are interested in the sharp constants for the square function, 
which are not available by way of the weak L 1 norm. 

Central to the proof of this estimate is the Calderon Zygmund Decomposition. 

3.4.17. Calderon Zygmund Decomposition. For f e of norm one, and let A > 0. Then, 
we can write f = gi + gi so that HgilU < A, and g 2 is supported on disjoint dyadic intervals 
{Ij : / > 1}, with 

(3.4.18) IIJ/^A" 1 , B(g 2 \I j ) = 0. 

instead, one has ||M f\\i ||S(/)||i where M is Maximal function. The theory of Hardy space H 1 depends 
critically on this equivalence. 
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Proof. This is a stopping time argument, but as we work on the dyadic grid, the details 
simplify considerably Take {I A to be the maximal dyadic intervals such that 

E(|/||Z ; )>A. 

Maximality assures us that these intervals are disjoint. Since ||/||i = 1, we have 

^|/;|<A^E(|/|1 I; .)<1. 

Set 

[f(x) otherwise. 

By the Lebesgue Differentiation Theorem (or Martingale Convergence Theorem), \\gi\\oo < 
A 



It is then clear that we have 

g2l h = flij - E(|/I . 

Thus, g2 satisfies all its desired properties. □ 

Proof of (3.4.16). Fix/ 6 I^of norm one and A > 0. As we work on a probability space, 
we can further restrict attention to A > 1. Apply the Calderon Zygmund Decomposition, 
writing f = gi + g 2 - 

Note that we have 

P(S(/) > 2A) < F(S(gi) > A) + P(S(g 2 ) > A), 

so that is suffices to analyze the two terms on the right separately. 
For gi, we use the L 2 bound for the Square Function so that 

A 2 P(S( gl ) > A) < ||S( gl )|| 2 

< IN! 



= 2 | uPflgil > u) du 
Jo 



<2A. 

The matches the required bound from (3.4.16). 

The case of g 2 is simpler. The function g 2 is supported on the dyadic intervals l } , and has 
mean zero on each dyadic interval. Thus, if / is any dyadic interval that strictly contains 
an Ij, we must have (g2, hj) = 0. It follows that the square function of gi is supported on 
the Ij, so that 

P(S(g 2 ) > 0) < £ P(J ; ) < A" 1 . 

i 

Our proof is complete. 

□ 
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We need further extensions of the Chang Wilson Wolff inequality, namely these exten- 
sions, which are essentially known. 

3.4.19. Theorem. For fi>0we have 

l|S(/)lb(logLy ^ H/lln(logL)/ i+1 /2 

Again, this holds for Hilbert space valued functions f. 

Prooe A variant of the duality principle is useful to us. We can choose a function g so 
that S(g) has exp(L 1 ^) norm one for which 

w)\Wo S Lf = e/ -<h % + jji\-\fM ( sM 

Ie!D 

= (f,g) 

^ H/llLl(l0gL)/^l/2 • I Igl 1^(^1/2)-! ) 

Now, by Proposition 3.1.2, and the sharp Littlewood Paley inequalities, 

ll^llexp^r^-supr-^ 1 ^!^!!, 

r>2 

<supr-P\\S(g)\\ r 

r>2 
< 1. 

Our proof is complete. □ 

3.5. Product Theory 

The product theory is a branch of Harmonic Analysis devoted to a range of issues that 
are effectively analyzed with tensor products of Haar bases. 2 

To describe this, again due to the local nature of the questions, we need to slightly 
modify the dyadic intervals. Before, we used T) to denote the dyadic intervals contained 
in [0, 1]. Let us set D+ to be these dyadic intervals together with the interval [0, 2]. Let us 
define the Haar function associated with [0, 2] to be the constant function. 

fyo,2] = 1[0,1] • 

(We could have taken these steps earlier, but it would have been confusing to do so.) Then, 
{hi : I G £)+} is an orthogonal basis for L 2 ([0, 1]). 

Let us construct the tensor product basis for L 2 ([0, l] d ). The basis elements are indexed 
by <R d := £>f , and for R = x • • • x R d e K d , set 

d 

h R (x 1/f ...,x d ) = Y[h Rj (xj). 

s=l 

A more typical description involves questions that are invariant under a family of dilations of two or 
more dilations. Dilations don't appear in these notes due to the local nature of the questions studied. 
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This is an orthogonal basis for L 2 ([0, l] d ). As in the one parameter setting, we are interested 
in the vector valued version of this space. 
The Haar Square Function in this setting is 

c m . TV W 1 1/2 
(3-5.1) S(/) .= . |2 IrJ • 

Reft,, 1 1 

As in the one parameter setting, it is clear that H/H2 = ||S(/)||2, and there is a deep extension 
of this equivalence to all V . Again, we are interested in the version of this result which 
has the sharp dependence in p. 

3.5.2. Theorem. We have the inequalities below, valid on [0, l] d . 

(3.5.3) \\f\\ p < p d/2 \\S(f)\\ p , Kp<oo 

(3.5.4) ||S(/)|| p <(p-l)- rf/2 ||f|| p , Kp<oo. 

Proof. The Duality Principle is still in effect, and so it suffices to prove one of the set 
of inequalities above. We prefer to prove the first inequalities. 

The method of proof is a standard iteration of the one parameter inequalities, in the 
vector valued setting, a common technique in the subject, see for instance [31,32]. 

Observe that the product Square Function is the composition of Square Functions 
applied in each coordinate. These Square Functions are then applied to Hilbert space 
valued functions. In particular, let Sy be the one parameter square function applied in the 
coordinate Xy. Then, 

S = Si o • • • o S d . 

Note that in applying Si,. . .,Sd-i, one should interpret it as applied to a Hilbert space 
valued functions. Namely in two dimensions, we interpret 

Si f(x lf x z ) := — l h (xi) : h e D+ , 

Wil 



and one computes the € 2 CD+) norm of this quantity. Then, 

S 2 oSi f(x lr x 2 ) := l h {xi)l h {x 2 ) : h,h e £>+ , 

VUil • 1^1 

And one computes the ^ 2 D+ x £)+ norm of the right hand side. 

It is clear that the Theorem then follows from the one parameter Littlewood Paley 
inequalities. 

□ 

Remark. Alternately, one can use the weighted inequality in Theorem 3.4.11, applied 
d times. Details are left to the reader. 

We briefly mention some other relevant inequalities. The weak type estimate is replaced 

by 
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3.5.5. Theorem. We have the inequality below on [0, l] d . 
(3-5.6) l|S(/)||i,oo < ll/lb(io g L)« 

The Maximal Function is 

Mf(x) = sup l R (x)E(/ 1 R) . 

ReO 

The principal inequalities are below, and in general are sharp. 
3.5.7. Theorem. We have the inequalities 

l|M/|| Ll ,»<||/|| Ll(logL) « / 

wfWpZH + ip-ir^wfWp- 

As we don't use this estimate, we do not prove it. 



CHAPTER 4 



Other Applications: Approximation Theory and Probability Theory 

4.1. Mixed Derivatives 

We will take an abbreviated view of the subject of this chapter, referring the reader to 
references, especially [36] for more information. In d dimensions, consider the map 

Int d /(*!, • • • ,x d ) := I •••I f(y ir -- ,y d ) dy---dy d 
Jo Jo 

We consider this as a map from L p ([0, l] d ) into C([0, l] d ). Clearly, the image of Int d consists 
of functions with U integrable mixed partial derivatives. Let us set 

Ball(MWP([0,l] rf )) := Int d ({/ e U{[Q,l] d ) : ||/|| p < 1}). 

That is, this is the image of the unit ball of II . This is the unit ball of the space of functions 
with mixed derivative in U. Our main theorem Theorem 1.1.7 has consequences for the 
case of p = 1, but in this discussion we concentrate of the case of p = 2, for which we have 
no new results. 

These sets are compact in in C([0, l] d ), and it is of relevance to quantify the compactness. 
The traditional way to do this is through entropy numbers. For < e < 1, set N(e) to be the 
least number N of points x 1/ - ■ ■ ,x N e C([0, l\ d ) so that 

N 

Ball(MW 2 ([0, l] d )) c |J x„ + eBoo. 

ii=i 

Here, £>„ is the unit ball of C([0, l] d ). An upper bound on these numbers is known, 
(4.1.1) logN(e) < e-^-loge)^ 2 

And the task at hand is to prove that this estimate is sharp. The case of d = 2 below follows 
from Tala grand [34]. 

4.1.2. Conjecture. For d > 2 one has the estimate 

log N(e) * e- 1 (log 1 /e) d - 1/2 , e | . 

How does Small Ball Conjecture enter in? We should use a a 'smooth' version of the 
Small Ball Conjecture. That is, in the Small Ball Conjecture, (1.1.3), one should replace the 
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'rough' Haar functions by smooth variants. There is no canonical way to do this, 1 and 
so we simply choose the 'spline variant' of Talagrand [34]. For dyadic interval R £ D in 
dimension d, set u R = Inti h R . 

4.1.3. Smooth Small Ball Conjecture. For all sequences a(R), we have the estimate below 
valid for all integers n. 

(4.1.4) 2~ 2n Yj HR)\ ^ " (rf " 2)/2 | Yj a ( R ) u R 

\R\=2-» \R\=2'" 

The power 2~ 2n is explained by the fact that the functions u R have L°° norm comparable 
to 2~ n . This inequality is true, and proved by Talagrand [34], but the methods of § 1.3 will 
provide simple proofs of related facts. 

Let us explain how this conjecture provides lower bounds for entropy numbers. Given 
a choice of signs o : {R : \R\ = 2~ n \ — > {±1}/ we consider the functions 

F a :=n^ 2 Y °W h * 

\R\=2-" 

Then, the mixed derivative of F a has norm about 1. The point of view is to let o vary to 
construct sets of points in Inti(B 2 ) that are widely separated. 
Suppose that for two different choices of a and a' , we have 

(4.1.5) Y - * nd ~ lT - 

|R|=2-« 

Then, Conjecture 4.1.3 enters in the following way: 

||Inti(F a - Mile = II Y (o(R) - o'(R))u R 

II oo 

|R|=2-« 

(4-1.6) > n -d + 3/2 2 -2n £ \o(R) - 0{R')\ 

\R\=2-" 

> n lll 2~ n 

Thus, a collection of F a satisfying (4.1.5) are uniformly separated in L°° norm. 

Notice that we have reduced the problem to one of finding many proportional subsets 
of 'Rn that are essentially disjoint from each other. This is addressed in a general fashion 
by this proposition. 

4.1.7. Proposition. There is a constant c> so that for all integers m, there is a collection of 
subsets ff{ of {1, . . . , m) so that 

(4.1.8) card(AAA') > cm, A ± A e m, 

(4.1.9) card(^) > exp(cm). 

^ne can replace the splines below by tensor products of wavelets, or by appropriate hyperbolic trigono- 
metric polynomials. 
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Apply this proposition the collection of dyadic rectangles {R : \R\ = 2~ n }. Let 3\ be the 
corresponding subsets of this collection, thus for A, A' G J\ we have |AaA'| > n d ~ l 2 n . Let 
A also stand for the function 

r l Re A 
-1 RiA' 



A(R) :-- 



Consider the collection {F A : A e Any two distinct functions in this collection obey 
the estimate (4.1.6), hence it follows that 

logN(n 1/2 2"") > log(^) > n d ~ l T . 

Setting e n x,1 2~ n , we see that we have 

(4.1.10) logN(e) > ^(log l/e) d ~ V2 , e|0. 

This would match the known upper bound, (4.1.1). Again, this inequality is known, and 
a consequence of Talagrand's work, in dimension d = 2. 

A Coding Theory Result. A useful observation is that Proposition 4.1.7 is concerned 
with the central issues of coding theory. Namely, each subset of {1, . . . ,m\ is identified 
with a word of length m, in an alphabet of two colors. The condition (4.1.8) implies 
that the words differ in a constant times m slots — that is that their Hamming distance is 
proportionally as large as possible. And the condition (4.1.8) assures us that the code has 
a large capacity. Fortunately, we can appeal to a well known result from Coding Theory 
to address this proposition. 

4.1.11. Theorem. [Varshanmov-Gilbert Bound]. We view {0, 1}" as a linear vector space mod 
2. It contains V, a linear subspace mod 2 with 



\x - y\ t i >d x, y e V 



iff the inequality below holds. 



To prove Proposition 4.1.7, in this Theorem, we take d = k = an, for a small constant a 
to be chosen. Note that the left hand side of (4.1.12) is at most 

\d-2l ~ \anj 



an- 



a"(l - a) n n"e 



V«(l - a) 



= a V«(l - a)n(a a (l - a) 1 "")"" 

< 2(l-a)»_ 
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Here, we are using Stirling's formula m\ ~ m m ~ 1 ^ 2 e~ m , meaning that the ratio of these two 
terms approaches a non zero constant. Observe that a a — > as a — > 0, so that we can make 
a choice of a for which this inequality will be true for all large n. 

4.2. The Brownian Sheet 

General Gaussian Processes. A Gaussian process is a random map X t : T — > R where 
T is some index set, so that for all finite S cT and reals a s , 

seS 

is a random variable with a Gaussian distribution. It is a fundamental property that a 
mean zero Gaussian process is characterized by the covariances 

p(s, := EX S • X t . 

Throughout, we will be concerned with processes which are almost surely have bounded 
sample paths, namely 

P(sup|Xj| < oo) = 1 . 

teT 

The Small Ball Problem concerns estimates for the probability 

P(sup|X t | < e), 10. 

teT 

See [22] for a survey on these types of questions. 

If one is given a subset K of a Hilbert space *H, then one can define an associated mean 
zero Gaussian process X s for s e K by defining 

EX s -X t := {s,t)<H 

where the last inner product is the one associated with < H. This is a canonical relationship 
with profound consequences: Most Gaussian processes of interest can be described in this 
manner, and the Hilbert space has function theoretic description which in turn reflects the 
structure of the Gaussian process. 

For instance, assume that associated with {X t : t e T\ are covariance kernel functions 
K t and measure /.i on T so that {K t : t e T} c L 2 (T, dp) and 



EX S • X t = 




Let tix be the L 2 (p) completion of the set of functions {K f : t G T}. This spaces is called the 
Reproducing Kernel Hilbert Space associated with the Gaussian process X t . 

Following on the work Talagrand, Kuelbs and Li [22] uncovered a close connection 
between the the Small Ball Probabilities and the covering numbers associated with the 
unit ball of 'Hx in the L°°(T) metric. We will recall this result in the particular instances of 
the Brownian sheet below. 
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4.2.1. The Brownian Sheet. The Brownian sheet is a canonical Gaussian process in- 
dexed by points s e [0, l] d . Calling the process B(s), it is characterized by requiring it to be 
a mean zero process with covariance structure 

d 

EB(s)B(t) = Y\ min(s ;/ tj) 

7=1 

Note that this covariance functional is given by 

MB(s)B(t) = \ 1 [0/S yl m dx. 

J[0,l] d 

The Reproducing Kernel Hilbert Space associated with the Brownian sheet is WM^ ; the 
Sobolev space of functions with square integrable mixed derivatives in dimension d. A 
particular case of the result of Kuelbs and Li [23] states that 

4.2.1. Theorem. As e I we have 
(4.2.2) logP(||B|| c([0/1]rf) < e )- e - 2 (logl/^ iff logN^-e-Hlogl/^ 2 . 

Thus, the Conjecture (4.1.3) gives a result on these processes. And the form of the 
relevant conjecture here is as follows. 

4.2.3. Small Ball Problem for the Brownian Sheet. For dimension d > 2 , we have 

logP(||/3|| C([ o,i]«) < e) - e" 2 (log l/e) 2 ^ 1 , e I . 

This is known for d = 2. For all d > 3, the upper bound on the Small Ball probabilities 
is known; the issue is to obtain the appropriate lower bound. In dimension d > 3, the best 
known lower bounds miss the conjecture above by a single power of log 1/e. 
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